arxiv: v2 [cs.lg] 4 Sep 2014

Size: px
Start display at page:

Download "arxiv: v2 [cs.lg] 4 Sep 2014"

Transcription

1 Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison ariv: v2 [cs.lg] 4 Sep 2014 Christopher R. Cox Timothy T. Rogers Department of Psychoogy University of Wisconsin-Madison Editor: Abstract crcox@wisc.edu ttrogers@wisc.edu Cassification with a sparsity constraint on the soution pays a centra roe in many high dimensiona machine earning appications. In some cases, the features can be grouped together, so that entire subsets of features can be seected or not seected. In many appications, however, this can be too restrictive. In this paper, we are interested in a ess restrictive form of structured sparse feature seection: we assume that whie features can be grouped according to some notion of simiarity, not a features in a group need be seected for the task at hand. When the groups are comprised of disjoint sets of features, this is sometimes referred to as the sparse group asso, and it aows for working with a richer cass of modes than traditiona group asso methods. Our framework generaizes conventiona sparse group asso further by aowing for overapping groups, an additiona fexibity needed in many appications and one that presents further chaenges. The main contribution of this paper is a new procedure caed Sparse Overapping Group (SOG) asso, a convex optimization program that automaticay seects simiar features for cassification in high dimensions. We estabish mode seection error bounds for SOGasso cassification probems under a fairy genera setting. In particuar, the error bounds are the first such resuts for cassification using the sparse group asso. Furthermore, the genera SOGasso bound speciaizes to resuts for the asso and the group asso, some known and some new. The SOGasso is motivated by muti-subject fmri studies in which functiona activity is cassified using brain voxes as features, source ocaization probems in Magnetoencephaography (MEG), and anayzing gene activation patterns in microarray data anaysis. Experiments with rea and synthetic data demonstrate the advantages of SOGasso compared to the asso and group asso. 1. Introduction Binary cassification pays a major roe in many machine earning and signa processing appications. In many modern appications where the number of features far exceeds the number of sampes, one typicay wishes to seect ony a few features, meaning ony a few coe cients are non zero in the soution 1. This corresponds to the case of searching for 1. a zero coe cient in the soution impies the corresponding feature is not seected 1

2 sparse soutions. The notion of sparsity prevents over-fitting and eads to more interpretabe soutions in high dimensiona machine earning, and has been extensivey studied in (Bach, 2010; Pan and Vershynin, 2013; Negahban et a., 2012; Bunea, 2008), among others. In many appications, we wish to impose structure on the sparsity pattern of the coe - cients recovered. In particuar, often it is known a priori that the optima sparsity pattern wi tend to invove custers or groups of coe cients, corresponding to pre-existing groups of features. The form of the groups is known, but the subset of groups that is reevant to the cassification task at hand is unknown. This prior knowedge reduces the space of possibe sparse coe cients thereby potentiay eading to better resuts than simpe asso methods. In such cases, the group asso, with or without overapping groups (Yuan and Lin, 2006) is used to recover the coe cients. The group asso forces a the coe cients in a group to be active at once: if a coe cient is seected for the task at hand, then a the coe cients in that group are seected. When the groups overap, a modification of the penaty aows one to recover coe cients that can be expressed as a union of groups (Jacob et a., 2009; Obozinski et a., 2011). Whie the group asso has enjoyed tremendous success in high dimensiona feature seection appications, we are interested in a much ess restrictive form of structured feature seection for cassification. Suppose that the features can be arranged into (possiby) overapping groups based on some notion of simiarity, depending on the appication. The notion of simiarity can be oosey defined, and it is used to refect the prior knowedge that if a feature is reevant for the earning task at hand, then features simiar to it may aso be reevant. It is known that whie many features may be simiar to each other, not a simiar features are reevant for the specific earning probem. We propose a new procedure caed Sparse Overapping Group (SOG) asso to refect this form of structured sparsity. As an exampe, consider the task of identifying reevant genes that pay a roe in predicting a disease. Genes are organized into pathways (Subramanian et a., 2005), but not every gene in a pathway might be reevant for prediction. At the same time, it is reasonabe to assume that if a gene from a particuar pathway is reevant, then other genes from the same pathway may aso be reevant. In such appications, the group asso may be too constraining whie the asso may be too under-constrained. 1.1 Mode and Resuts We first present the main resuts of this paper at a gance. Uppercase and owercase bod etters indicate matrices and vectors respectivey. We assume a sparse earning framework, with a feature matrix 2 R n p, n p. We assume each eement of to be distributed as a standard Gaussian random variabe. Assuming the data to arise from a Gaussian distribution simpifies anaysis, and aows us to everage toos from existing iterature. Later in the paper, we wi aow for correations in the features as we, refecting a more reaistic setting. In the resuts that foow, C is a positive constant, the vaue of which can be di erent from one resut to the other. We focus on binary cassification settings, and assume that each observation y i 2 { 1, +1}, i = 1, 2,...,n are randomy distributed according to the mode (Pan and Vershynin, 2013) E[y i i] =f(h i, x? i), (1) 2

3 where i is the i th row of corresponding to the features of data i, x? is the true coe cient vector of interest, and f is a function mapping from R to [ 1, +1]. The argument of f is the Eucidean inner product: h i, x? i = T i x?. The function f need not be known precisey. We ony assume that it satisfies for g N (0, 1) E[f(g)g] > 0. (2) Without oss of generaity, we assume x? to have unit Eucidean norm, since the normaization can be absorbed into the function f. The vaue f := 1/E[f(g)g] (3) quantifies the strength of the correation between the abes y i and inner products h i, x? i. It pays the roe of the noise eve and wi appear in our error bounds, but it need not be known to compute our proposed estimator of x?. This set-up aows for the consideration of a very genera setting for cassification, and subsumes many interesting cases. For exampe, the ogistic mode is equivaent to P (y i = 1) = exp( h i, x? i) 1 + exp( h i, x? i), (4) f(h i, x? i) = tanh( h i, x? i), for any constant > 0, yieding a corresponding 2 f = 2 E[sech 2 ( g/2)] 1 appe 6 min{, 1}. The constant accounts for the fact that we consider kx? k = 1. Indeed, for a genera vector z z, we can write h i, zi = h i, kzk i,where = kzk. The second argument in the inner product is now a vector of unit norm, and it gives rise to the expression in (4). The framework aso aows for the quantized 1-bit measurement mode f(h i, x? i) = sign(h i, x? i), which can be seen as the imiting case of the ogistic mode as!1, and with f = p 2 We work with this genera formuation since for cassification probems of interest, the ogistic (or any other) mode may be chosen somewhat arbitrariy. Existing theoretica resuts often appy ony to the chosen mode (for exampe, (Negahban et a., 2012)). Our estimator ony requires the observations are correated with the features, in the sense of (3). In any such case, the underying form of f need not be known to compute our estimator of x? and wi enter in the error bounds ony through f. We are interested in the foowing form of structured sparsity. Assume that the features can be organized into K possiby overapping groups, each consisting of L features, based on a user-defined measure of simiarity, depending on the appication. Moreover, assume that if a certain feature is reevant for the earning task at hand, then features simiar to it may 2. See Coroary 3.3 in (Pan and Vershynin, 2013) for a derivation 3

4 aso be reevant. Note that we assume groups of equa size L for convenience. It is easy to reax this assumption. These assumptions suggest a structured pattern of sparsity in the coe cients wherein a subset of the groups are reevant to the earning task, and within the reevant groups a subset of the features are seected. In other words, x? 2 R p has the foowing structure: its support is ocaized to a union of a subset of the groups, and its support is ocaized to a sparse subset within each such group Armed with these preiminaries, we state the theoretica sampe compexity bounds proved ater in the paper. If x? 2 R p has k appe K non-zero groups and appe L coe cients non-zero within each non-zero group, then O( f 2 k og( K k )+og( L )+ ) independent Gaussian measurements of the form (1) are su cient to accuratey estimate x? by soving a convex program. The statement above merits further expanation. We show in Lemma 12 (Section 4.1) via a combinatoria argument that to estimate any vector with parameters k, K,, L as stated above, O( f 2k og( K k )+og( L )+ ) sampes woud su ce. However, ooking for vectors with these properties amounts to soving a non-convex program. When the groups do not overap, we show that the soution to a convex program aso succeeds in accuratey estimating x? using the same number of measurements. When the groups overap, we show that the measurement bound hods with an additiona factor of R 2,whereR 1isthe maximum number of groups that contain any one feature. In most appications of interest (e.g., the fmri exampe discussed beow), R is sma. Nonetheess, we aso show that no matter how many groups a particuar coe cient beongs to, the sampe compexity of our proposed estimator is never greater than that of the standard asso or the overapping group asso. These statements wi be made more precise in the seque. 1.2 Motivation: The SOGasso for Mutitask Learning The SOG asso is motivated in part by mutitask earning appications. The group asso is a commony used too in mutitask earning, and it encourages the same set of features to be seected across a tasks. We wish to focus on a ess restrictive version of mutitask earning, where the main idea is to encourage seection of features that are simiar, but not identica, across tasks. This is accompished by defining subsets of simiar features and searching for soutions that seect ony a few subsets (common across tasks) and a sparse number of features within each subset (possiby di erent across tasks). Figure 1 shows an exampe of the patterns that typicay arise in sparse mutitask earning appications, aong with the one we are interested in. A major appication that we are motivated by is the anaysis of muti-subject fmri data, where the goa is to predict a cognitive state from measured neura activity using voxes as features. Because brains vary in size and shape, neura structures can be aigned ony crudey. Moreover, neura codes can vary somewhat across individuas (Feredoes et a., 2007). Thus, neuroanatomy provides ony an approximate guide as to where reevant information is ocated across individuas: a voxe usefu for prediction in one participant suggests 4

5 (a) Sparse (b) Group sparse (c) sparse sparse Group pus (d) sparse sparse Group and Figure 1: A comparison of di erent sparsity patterns in the mutitask earning setting. Figure (a) shows a standard sparsity pattern. An exampe of group sparse patterns promoted by Gasso (Yuan and Lin, 2006) is shown in Figure (b). In Figure (c), we show the patterns considered in (Jaai et a., 2010). Finay, in Figure (d), we show the patterns we are interested in this paper. The groups are sets of rows of the matrix, and can overap with each other Figure 2: SOGasso for fmri inference. The figure shows three brains, and voxes in a particuar anatomica region are grouped together, across a individuas (red and green eipses). For exampe, the green eipse in the brains represents a singe group. The groups denote anatomicay simiar regions in the brain that may be co-activated. However, within activated regions, the exact ocation and number of voxes may di er, as seen from the green spots. the genera anatomica neighborhood where usefu voxes may be found, but not the precise voxe. Past work in inferring sparsity patterns across subjects has invoved the use of groupwise reguarization (van Gerven et a., 2009), using the ogistic asso to infer sparsity patterns without taking into account the reationships across di erent subjects (Ryai et a., 2010), or using the eastic net penaty to account for groupings among coe cients (Rish et a., 2012). These methods do not excusivey take into account both the common macrostructure and the di erences in microstructure across brains, and the SOGasso aows one to mode both the commonaities and the di erences across brains. Figure 2 sheds ight on the motivation, and the grouping of voxes across brains into overapping groups 5

6 1.3 Our Contributions In this paper, we consider binary cassification with a constraint on the structure of the sparsity pattern of the coe cients. We assume that the coe cients can be arranged (according to a predefined notion of simiarity) into (overapping) groups. Not ony are ony a few groups seected, but the seected groups themseves are aso sparse. In this sense, our constraint can be seen as an extension of sparse group asso (Simon et a., 2013) for overapping groups where the sparsity pattern ies in a union of groups. We are mainy interested in cassification probems, but the method can aso be appied to regression settings, by making an appropriate change in the oss function of course. We consider a union-of-groups formuation as in (Jacob et a., 2009), but with an additiona sparsity constraint on the seected groups. To this end, we anayze the Sparse Overapping Sets (SOG) asso, where the overapping groups might correspond to coe cients of features arbitrariy grouped according to the notion of simiarity. We aso consider a very genera cassification setting, and do not make restrictive assumptions on the observation mode. We introduce a constraint that promotes sparsity patterns that can be expressed as a union of sparsey activated groups. We show that the constraint is a tight convex reaxation of the set of coe cients having the sparsity pattern we are interested in. The main contribution of this paper is a theoretica anaysis of the mode seection consistency of the SOGasso estimator, under a very genera binary cassification setting. Based on certain parameter vaues, our method reduces to other known cases of penaization for sparse high dimensiona recovery. Specificay, under the ogistic regression mode, our method generaizes the group asso (Meier et a., 2008; Jacob et a., 2009), and aso extends to hande groups that can arbitrariy overap with each other. We aso recover resuts for the asso for ogistic regression (Bunea, 2008; Negahban et a., 2012; Pan and Vershynin, 2013; Bach, 2010). In this sense, our work unifies the asso, the group asso as we as the sparse group asso for to hande overapping groups. At the same time, our methods appy to settings beyond the ogistic regression, to incude a far richer cass of modes. To the best of our knowedge, this is the first paper that provides such a unified theory and sampe compexity bounds for a these methods. In the case of inear regression and mutitask earning, the authors in (Sprechmann et a., 2010, 2011), consider a simiar situation with non overapping subsets of features. We assume that the features can arbitrariy overap. When the groups overap, the methods mentioned above su er from a drawback: entire groups are set to zero, in e ect zeroing out many coe cients that might be reevant to the tasks at hand. This has undesirabe e ects in many appications of interest, and the authors in (Jacob et a., 2009) propose a version of the group asso to circumvent this issue. We aso test our reguarizer on both toy and rea datasets. Our experiments reinforce our theoretica resuts, and demonstrate the advantages of the SOGasso over standard asso and group asso methods, when the features can indeed be grouped according to some notion of simiarity. We show that the SOGasso is especiay usefu in mutitask Functiona Magnetic Resonance Imaging (fmri) appications, and gene seection appications in computationa bioogy. To summarize, the main contributions of this paper are the foowing: 6

7 1. New reguarizers for structured sparsity: We propose the Sparse Overapping Group (SOG) asso, a convex optimization probem that encourages the seection of coe cients that are both within-and across- group sparse. The groups can arbitrariy overap, and the pattern obtained can be represented as a union of a sma number of groups. This generaizes other known methods, and provides a common reguarizer that can be used for any structured sparse probem with two eves of hierarchy 3 : groups at a higher eve, and singetons at the ower eve. 2. New theory for cassification with structured sparsity: We provide a theoretica anaysis for the mode seection consistency of the SOGasso estimator, for binary cassification. The genera resuts we obtain speciaize to the asso, the group asso (with or without overapping groups) and the sparse group asso. We aso make minima assumptions on the measurement mode, aowing for theory that is appicabe in a wide range of cassification settings. We obtain a bound on the sampe compexity of the SOGasso under both independent and correated Gaussian measurement designs, and this in turn aso transates to corresponding resuts for the asso and the group asso. In this sense, we obtain a unified theory for performing structured feature seection in high dimensions. 3. Appications: A major motivating appication for this work is the anaysis of mutisubject fmri data. We appy the SOGasso to fmri data, and show that the resuts we obtain not ony yied ower errors on hod-out test sets compared to previous methods, but aso ead to more interpretabe resuts. To show it s appicabiity to other domains, we aso appy the method to breast cancer data to detect genes that are reevant in the prediction of metastasis in breast cancer tumors. In (Rao et a., 2013), the authors introduced the SOGasso probem emphasizing the motivating fmri appication. The authors aso derived theoretica consistency resuts under the inear regression setting. This paper gives further embeishes the reasons for considering the SOGasso penaty, and derives consistency resuts for cassification. We aso present some nove appications in computationa bioogy where simiar notions can be appied to achieve significant gains over existing methods. Our work here presents nove resuts for the group asso with potentiay overapping groups as we as the sparse group asso for cassification settings, as specia cases of the theory we deveop. 1.4 Past Work When the groups of features do not overap, (Simon et a., 2013) proposed the Sparse Group Lasso (SGL) to recover coe cients that are both within- and across- group sparse. SGL and its variants for mutitask earning has found appications in character recognition (Sprechmann et a., 2011, 2010), cimate and oceanoogy appications (Chatterjee et a., 2011), and in gene seection in computationa bioogy (Simon et a., 2013). In (Jenatton et a., 2011), the authors extended the method to hande tree structured sparsity patterns, and showed that the resuting optimization probem admits an e cient impementation in terms of proxima point operators. Aong reated ines, the excusive asso (Zhou et a., 3. Further eves can aso be added as in (Jenatton et a., 2011), but that is beyond the scope of this paper. 7

8 2010) can be used when it is expicity known that features in certain groups are negativey correated. When the groups overap, (Jacob et a., 2009; Obozinski et a., 2011) proposed a modification of the group asso penaty so that the resuting coe cients can be expressed as a union of groups. They proposed a repication-based strategy for soving the probem, which has since found appication in computationa bioogy (Jacob et a., 2009) and image processing (Rao et a., 2011), among others. The authors in (Mosci et a., 2010) proposed a method to sove the same probem in a prima-dua framework, that does not require coe cient repication. Risk bounds for probems with structured sparsity inducing penaties (incuding the asso and group asso) were obtained by (Maurer and Ponti, 2012) using Rademacher compexities. Sampe compexity bounds for mode seection in inear regression using the group asso (with possiby overapping groups) aso exist (Rao et a., 2012). The resuts naturay hod for the standard group asso (Yuan and Lin, 2006), since non overapping groups are a specia case. For the non overapping case, (Stojnic et a., 2009) characterized the sampe compexity of the group asso, and aso gave a semidefinite program to sove the group asso under a bock sparsity setting. For ogistic regression, (Bach, 2010; Bunea, 2008; Negahban et a., 2012; Pan and Vershynin, 2013) and references therein have extensivey characterized the sampe compexity of identifying the correct mode using `1 reguarized optimization. The authors in (Negahban et a., 2012) extended their resuts to incude Generaized Linear Modes as we (GLM s). In (Pan and Vershynin, 2013), the authors introduced a new framework to sove the cassification probem: minimize 4 a inear cost function subject to a constraint on the `1 norm of the soution. 1.5 Organization The rest of the paper is organized as foows: in Section 2, we formay state our structured sparse feature seection probem and the main resuts of this paper. Then in Section 3, we argue that the reguarizer we propose does indeed hep in recovering coe cient sparsity patterns that are both within-and across group sparse, even when the groups overap. In Section 4, we everage ideas from (Pan and Vershynin, 2013) and derive measurement bounds and consistency resuts for the SOGasso under a ogistic regression setting. We aso extend these resuts to hande data with correations in their entries. We perform experiments on rea and toy data in Section 5, before concuding the paper and mentioning avenues for future research in Section Main Resuts: Cassification with Structured Sparsity We now return to the probem that we wish to sove in this paper, and state our main resuts in a more forma way. Reca that we are interested in recovering a coe cient vector x?, from (corrupted) inear observations of the form h i, x? i The coe cient vector of interest is assumed to have a specia structure. Specificay, we assume that x? 2 C B p 2,whereBp 2 is the unit eucidean ba in Rp. This motivates the 4. The authors in (Pan and Vershynin, 2013) write the probem as a maximization. We minimize the negative of the same function 8

9 foowing optimization probem (Pan and Vershynin, 2013): bx = arg min x n i=1 y i h i, xi s.t. x 2 C. (5) The function to be optimized has a very natura interpretation: We assume without oss of generaity that the observations are positivey correated 5 with the inner products between the features and the coe cient vector. Hence, a natura thing to do woud be to maximize the number of sign agreements between y i and h i, xi. The objective function in (5) maximizes the product of the two terms, a inear reaxation of the quantity we wish to optimize. The statistica accuracy of bx can be characterized in terms of the mean width of C, which is defined as foows Definition 1 Let g 2 N (0, I). The mean width of a set C is defined as appe!(c) =E g sup hx, gi, x2c C where C C denotes the Minkowski set di erence. We now restate a resut from (Pan and Vershynin, 2013) Theorem 2 (Coroary 1.2 in (Pan and Vershynin, 2013)) Let 2 R n p be a matrix with i.i.d. standard Gaussian entries, and et C B p 2. Fix x? 2 C, andassumethe observations foow the mode (1) above. Then, for > 0, if! n C 2 f 2!(C) 2, then with probabiity at east 1 8 exp( c( f )2 n), thesoutionbx to the probem (5) satisfies kbx x? k 2 appe with f defined in (3). We abuse notation and define the mean width as!(c) =E g appesuphx, gi. (6) x2c The quantity defined above is a constant mutipe of that in the origina definition for centray symmetric sets C (Pan and Vershynin, 2013), which wi be the case for the remainder of this paper. In this paper, we construct a new penaty that produces a convex set C that encourages structured sparsity in the soution of (5). We show that the resuting optimization can be e cienty soved. We bound the mean width of the set, which yieds new bounds for cassification with structured sparsity, via Theorem 2. We state the main resuts in this section, and defer the proofs to Section 4 5. If the correation is negative, the signs of y i can be reversed 9

10 2.1 A New Penaty for Structured Sparsity Assume that the features can be grouped according to simiarity into K (possiby overapping) groups G = {G 1,G 2,...,G K } with the argest group being of size L and consider the foowing definition of structured sparsity. Definition 3 We say that a vector x is (k, )-group sparse if x is supported on at most k appe K groups and at most eements in each active group are non zero. Note that = 0 corresponds to x = 0. To encourage such sparsity patterns we define the foowing penaty. G 2 G, wedefinetheset Given a group W G = {w 2 R p : w i =0 if i 62 G}. We can then define ( W(x) = w G1 2 W G1, w G2 2 W G2,...,w GM 2 W GM : ) w G = x. That is, each eement of W(x) is a set of vectors, one from each W G, such that the vectors sum to x. As shorthand, in the seque we write {w G } 2 W(x) to mean a set of vectors that form an eement in W(x) For any x 2 R p,define h(x) := inf ( G kw G k 2 + G kw G k 1 ), (7) {w G }2W(x) where the G, G > 0 are constants that tradeo the contributions of the `2 and the `1 norm terms per group, respectivey. The SOGasso is the optimization in (5) with h(x) as defined in (7) determining the structure of the constraint set C, and hence the form of the soution bx. The `2 penaty promotes the seection of ony a subset of the groups, and the `1 penaty promotes the seection of ony a subset of the features within a group. To keep the exposition simpe, we wi work with the foowing definition of h(x) inthe rest of the paper: h(x) := inf kw G k 2 + p 1 kw G k 1. (8) {w G }2W(x) Note that the vaue of 1 0 can be varied to both emphasize or de emphasize the `1 penaty. In amost a appications of interest, the vaue of wi obviousy be unknown, and the quantity p 1 needs to be tuned via cross vaidation. However, for the sake of proving our theorems, we wi assume that the quantity is known (our goa is to recover vectors that are (k, )- group sparse). This is consistent with other resuts in the iterature where it is assumed that the parameters are known. Definition 4 We say the set of vectors {w G } 2 W(x) is an optima representation of x if they achieve the inf in (8). 10

11 The objective function in (8) is convex and coercive. Hence, 8x, an optima representation aways exists. The function h(x) yieds a convex reaxation for (k, )-group sparsity. Define the constraint set C = {x : h(x) appe p k (1 + 1 ), kxk 2 appe 1}. (9) We show that C is convex and contains a (k, )-group sparse vectors. We compute the mean width of C in (9), and subsequenty obtain the foowing resut: Theorem 5 Suppose there exists a coe cient vector x? that is (k, )-group sparse. Suppose the data matrix 2 R n p and observation mode foow the setting in Theorem 2. Suppose we sove (5) for the constraint set given by (9). For > 0 and a constant C, if the number of measurements satisfies ( 2 f 1+ 2 n C 2 k min (1 + 1 ) 2 1 (og (K)+L), og (p)), then with high probabiity, the soution of the SOGasso satisfies 1 kbx x? k 2 2 appe. Remarks The resuts of Theorem 5 yied new resuts for cassification with structured sparsity under the genera binary observation setting (1). Specificay, note that the SOGasso interpoates between the standard `1 reguarized (asso) and the group `1 reguarized (group asso) cassification techniques: When 1 = 0, we obtain resuts for the group asso. The resut remains the same whether or not the groups overap. The bound is given by n C 2 k(og(k)+l). Note that this resut is simiar to that obtained for the inear regression case by the authors in (Rao et a., 2012). When a the groups are singetons, (L = = 1), the bound reduces to that for the standard asso, with KL = p being the ambient dimension. In this case, the signa sparsity s := k and the bound becomes: n C 2 k og(p). The SOGasso generaizes the asso and the group asso, and aows one to recover signas that are sparse, group sparse, or a combination of the two structures. Moreover, since the mode we consider subsumes the ogistic regression setting, we obtain resuts for ogistic regression with a genera structured sparsity constraint on the soution. To the best of our knowedge, these are the first known sampe compexity bounds for the group asso for ogistic regression with overapping groups, and the sparse group asso, both of which arise as specia cases of the SOGasso. 11

12 Probem (5) admits an e cient soution. Specificay, we can use the feature repication strategy as in (Jacob et a., 2009) to reduce the probem to a sparse group asso, and use proxima point methods to recover the coe cient vector. We eaborate this in more detai ater in the paper. Theorem 5 bounds the number of measurements su cient for accurate estimation using the overapping group asso and the asso. A natura question to then ask is whether one can do better when it is known that the vectors we are interested in are further constrained by the number of non zero entries in active groups. When it is known that each coe cient beongs to at most R groups 6, we obtain the foowing sampe compexity bound Theorem 6 Suppose the coe cient vector is (k, ) group sparse, with everything ese the same as in Theorem 5. Suppose that each coe cient beongs to at most R groups. Then, with high probabiity, if the number of measurements satisfies 2 appe f K L n C 2 R2 k og + og + +2, k we have kˆx x? k 2 appe. The above resut yieds a tight bound when the groups do not overap. Indeed, when R = 1 we see that the sampe compexity bound is a function of the ogarithm of the number of groups, and the overa sparsity of the signa k. This is a tighter resut that the bound obtained for the group asso, where the second term woud be O(kL). We pay a price of R 2 for overapping groups, but in most practica exampes we are interested in, R is typicay sma, and is a constant. 3. Anaysis of the SOGasso Penaty Reca the definition of h(x), from (8): where we set µ = p 1 Remarks : h(x) = inf {w G }2W(x) kw G k 2 + µkw G k 1 (10) The SOGasso penaty can be seen as a generaization of di erent penaty functions previousy expored in the context of sparse inear regression and/or cassification: If each group in G is a singeton, then the SOGasso penaty reduces to the standard `1 norm, and the probem reduces to the asso (Tibshirani, 1996; Bunea, 2008) if 1 = 0 in (8), then we are eft with the atent group asso (Jacob et a., 2009; Obozinski et a., 2011; Rao et a., 2012). This aows us to recover sparsity patterns that can be expressed as ying in a union of groups. If a group is seected, then a the coe cients in the group are seected. 6. The vaue of R wi aways be known, since we assume that the groups G are known 12

13 If the groups G 2 G are non overapping, then (8) reduces to the sparse group asso (Simon et a., 2013). Of course, for non overapping groups, if 1 = 0, then we get the standard group asso (Yuan and Lin, 2006). Figure 3 shows the e ect that the parameter µ has on the shape of the ba kw G k + µkw G k 1 appe, for a singe two dimensiona group G. (a) µ =0 (b) µ =0.2 (c) µ =1 (d) µ =10 Figure 3: E ect of µ on the shape of the set kw G k + µkw G k 1 appe, for a two dimensiona group G. µ = 0 (a) yieds eve sets for the `2 norm ba. As the vaue of µ in increased, the e ect of the `1 norm term increases (b) (c). Finay as µ gets very arge, the eve sets resembe that of the `1 ba (d). 3.1 Properties of SOGasso Penaty The exampe in Tabe 1 gives an insight into the kind of sparsity patterns preferred by the function h(x). We wi tend to prefer soutions that have a sma vaue of h( ). Consider 3 instances of x 2 R 10, and the corresponding group asso, `1 norm, and h(x) function vaues. The vector is assumed to be made up of two groups, G 1 = {1, 2, 3, 4, 5} and G 2 = {6, 7, 8, 9, 10}. h(x) is smaest when the support set is sparse within groups, and aso when ony one of the two groups is seected (coumn 5). The `1 norm does not take into account sparsity across groups (coumn 4), whie the group asso norm does not take into account sparsity within groups (coumn 3). Since the groups do not overap, the atent group asso penaty reduces to the group asso penaty and h(x) reduces to the sparse group asso penaty. Support Vaues PG kx P Gk kxk 1 G (kx Gk + kx G k 1 ) {1, 4, 9} {3, 4, 7} {1, 2, 3, 4, 5} {2, 5, 2, 4, 5} {1, 3, 4} {3, 4, 7} Tabe 1: Di erent instances of a 10-d vector and their corresponding norms. The next tabe shows that h(x) indeed favors soutions that are not ony group sparse, but aso exhibit sparsity within groups when the groups overap. Consider again a 10- dimensiona vector x with three overapping groups {1, 2, 3, 4}, {3, 4, 5, 6, 7} and {7, 8, 9, 10}. Suppose the vector x = [ ] T. From the form of the function in (8), we see that the vector can be seen as a sum of three vectors w i, i =1, 2, 3, corresponding 13

14 to the three groups isted above. Consider the foowing instances of the w i vectors, which are a feasibe soutions for the optimization probem in (10): 1. w 1 =[ ] T, w 2 = [ ] T, w 3 = [ ] T 2. w 1 = [ ] T, w 2 = [ ] T, w 3 = [ ] T 3. w 1 = [ ] T, w 2 = [ ] T, w 3 = [ ] T 4. w 1 = [ ] T, w 2 = [ ] T, w 3 = [ ] T In the above ist, the first instance corresponds to the case where the support is ocaized to two groups, and one of these groups (group 2) has ony one zero. The second case corresponds to the case where a 3 groups have non zeros in them. The third case has support ocaized to two groups, and both groups are sparse. Finay, the fourth case has ony the second group having non zero coe cients, and this group is aso sparse. Tabe 2 shows that the smaest vaue of the sum of the terms is achieved by the fourth decomposition, and hence that wi correspond to the optima representation. A = kw 1 k + µkw 1 k 1 B = kw 2 k + µkw 2 k 1 C = kw 3 k + µkw 3 k 1 A + B + C 1+µ 2+4µ 0 3+5µ 1+µ p 1+µ 1+µ 3+3µ p 0 p 2+2µ 1+µ 1+ p 2+3µ 0 3+3µ 0 3+3µ Tabe 2: Vaues of the sum of the `1 and `2 norms corresponding to the decompositions isted above. Note that the optima representation corresponds to the case w 1 = w 3 = 0, and w 2 being a sparse vector. Lasty, we can show that h(x) is a norm. This wi impy that h(x) is convex, and hence the penaty we consider wi be convex. This wi then mean that the optimization we are interested in soving is a convex program. Lemma 7 The function is a norm h(x) = inf {w G }2W(x) (kw G k 2 + µkw G k 1 ) Proof It is trivia to show that h(x) 0 with equaity i x = 0. We now show positive homogeneity. Suppose {w G } 2 W(x) is an optima representation (Definition 4) of x, and 14

15 et 2 R\{0}. Then, P w G = x ) P w G = x. This eads to the foowing set of inequaities: h(x) = (kw G k 2 + µkw G k 1 )= 1 (k w G k 2 + µk w G k 1 ) 1 h( x) (11) Now, assuming {v G } 2 W( x) is an optima representation of x, we have that P v G = x, and we get h( x) = (kv G k 2 + µkv G k 1 )= vg 2 + µ v G 1 h(x) (12) Positive homogeneity foows from (11) and (12). The inequaities are a resut of the possibiity of the vectors not corresponding to the respective optima representations. For the triange inequaity, again et {w G } 2 W(x), {v G } 2 W(y) correspond to the optima representation for x, y respectivey. Then by definition, h(x + y) appe (kw G + v G k 2 + µkw G + v G k 1 ) appe (kw G k 2 + kv G k 2 + µkw G k 1 + µkv G k 1 ) = h(x)+h(y) The first and second inequaities foow by definition and the triange inequaity respectivey. 3.2 Soving the SOGasso Probem We sove the Lagrangian version of the SOGasso probem: ˆx = arg min x n i=1 y i h i, xi! + 1 h(x)+ 2 kxk 2, where 1 > 0 contros the amount by which we reguarize the coe cients to have a structured sparsity pattern, and 2 > 0preventsthecoe cients from taking very arge vaues. We use the covariate dupication method of (Jacob et a., 2009) to first reduce the probem to the non overapping sparse group asso in an expanded space. One can then use proxima methods to recover the coe cients. Proxima point methods progress by taking a step in the direction of the negative gradient, and appying a shrinkage/proxima point mapping to the iterate. This mapping can be computed e cienty for the non overapping sparse group asso, as it is a specia case of genera hierarchica structured penaties (Jenatton et a., 2011). The proxima point mapping can be seen as the composition of the standard soft threshoding and the group 15

16 soft threshoding operators: w = sign(w r )[ w r 1 µ] + (w t+1 ) G = ( w) G k( w) G k [k( w Gk 1 ] + if k( w) G k6=0 (w t+1 ) G =0 otherwise where w r corresponds to the iterate after a gradient step and [ ] + = max (0, ). Once the soution is obtained in the dupicated space, we then recombine the dupicates to obtain the soution in the origina space. Finay, we perform a debiasing step to obtain the fina soution. 4. Proof of Theorem 5, Theorem 6 and Extensions to Correated Data In this section, we compute the mean width of the constraint set C in (9), which wi be used to prove Theorems 5 and 6. First we define the foowing function, anaogous to the `0 pseudo-norm: Definition 8 Given a set of K groups G, for any vector x and its optima representation {w G } 2 W(x), notingthatx = P w G, define kxk G,0 = 1 {kwg k6=0}. In the above definition, 1 { } is the indicator function. Define the set ( C nc (k, ) = x : x = ) w G, kxk G,0 appe k, kw G k 0 appe k 8G 2 G. (13) We see that C nc (k, ) contains (k, )-group sparse signas (Definition 3). From the above definitions and our probem setup, our aim is to ideay sove the foowing optimization probem bx = arg min x n i=1 y ti h ti, x t i s.t. x 2 C nc (k, ) (14) However, the set C nc (k, ) is not convex, and hence soving (14) wi be hard in genera. We instead consider a convex reaxation of the above probem. The convex reaxation of the (overapping) group `0 pseudo-norm is the (overapping) group `1/`2 norm. This eads to the foowing resut: Lemma 9 The SOGasso penaty (10) admits a convex reaxation of C idea (k, ). Specificay, we can consider the set C(k, ) ={x : h(x) appe p k(1 + 1 )kxk 2 } as a tight convex reaxation containing the set C nc (k, ). 16

17 Proof Consider a (k, )-group sparse vector x 2 C nc (k, ). For any such vector, there exist vectors {v G } 2 W(x) such that the supports of v G do not overap. We then have the foowing set of inequaities h(x) = inf kw G k 2 + p 1 kw G k 1 {w G }2W(x) (i) appe kv G k p kv G k 1 (ii) appe kv G k p p kv G k 2 =(1+ 1 ) kv G k 2 (iii) appe p k (1 + 1 ) kv G k 2 2! 1 2 = p k (1 + 1 ) kxk 2 where (i) foows from the definition of the function h(x) in (8), and (ii) and (iii) foow from the fact that for any vector v 2 R d we have kvk 1 appe p d kvk 2.This, couped with the fact that h(x) is a norm (Lemma 7) ensures that the set C(k, ) is convex. To show that the reaxation is tight, we wi consider a (k, ) sparse vector x and show that the inequaity in the definition of the set hods with equaity. Specificay, et x 2 R p with non overapping groups, and et the first k w G s in it s representation be active. Moreover, suppose the first entries in each of these w G s are non zero. Let the non zero entries a be equa to p 1 k. Then kxk = 1, P G kw Gk 2 = p k and P G kw Gk 1 = p k. The resut foows. 4.1 Mean Widths for the SOGasso We see that, the mean width of the constraint set pays a crucia roe in determining the consistency of the soution of the optimization probem. We now aim to find the mean width of the constraint set in (9), and as a resut of it, prove Theorems 5 and 6. Before we do so, we restate Lemma 3.2 in (Rao et a., 2012) for the sake of competeness: Lemma 10 Let q 1,...,q K be K, -squared random variabes with d-degrees of freedom. Then E[ max 1appeiappeK q i] appe ( p 2 og(k)+ p d) 2. First, we prove Theorem 6. Lemma 11 Suppose that it is known that each coe cient is part of at most R groups, and suppose we et h(x) = inf kw G k 2 + p 1 kw G k 1 {w x}2w GinG 17

18 Then the mean width of the set C = {x : h(x) appe p k (1 + 1 ), kxk 2 appe 1} is bounded as appe!(c) 2 appe CR 2 k og K + og k L Proof The intuition behind this proof is as foows: We first consider a non convex set, which is the idea set of (k, ) sparse vectors that we are interested in. We then show that C is contained in the scaed convex hu of the non convex set, and hence by the properties of the mean width,!(c) can be bounded by a scaing of that of the non convex set 7. To this end, et us consider the foowing non convex idea set, C nc = {x : kxk appe1, kxk G,0 appe k, kw G k 0 appe k}. (15) Consider x 2 C. We now define vectors x i as foows: x 1 = P k r=1 wr 1, where the vectors w 1 are the k vectors w G with argest norm. Aong these ines, we define x i = P k r=1 wr i. For a fixed i, etx i1 be the vector containing the top k entries of x i by magnitude, and define a genera x ij in this manner as we. Note that x i = P j x ij and x = P i x x i. Aso, note that ij kx ij k 2 C nc since it has at most k active groups, and at most k non zero eements. Finay, note the foowing: By construction, we have for a fixed i, and j>1 kx ij k 2 appe 1 p k kx ij 1 k 1. (16) This foows since each eement of x ij is smaer than the average of the entries of the vector x ij 1. Using the exact same argument, we aso have for i>1 k kwi r k 2 2 r=1! 1 2 appe 1 p k k kwi r 1k 2. (17) r=1 7. Lemma 9 showed that the set C is a tight convex outer reaxation of the non convex set. 18

19 Now, kx ij k = kx 11 k 2 + kx i1 k 2 + kx ij k 2 ij i>1 i j>1 appekx 11 k 2 + kx i1 k p kx ij 1 k 1 from (16) i>1 i j>1 k appekx 11 k 2 + kx i1 k p kx ij k 1 i>1 i j k appekx 11 k 2 + kx i1 k p kx i k 1 since the indices for j are disjoint i>1 k i = kx 11 k 2 + kx i1 k k p wi r i>1 k i r=1 1 appekx 11 k 2 + kx i1 k k p kwi r k 1 triange inequaity i>1 k i r=1 = kx 11 k 2 + kx i1 k p kw G k 1 (18) i>1 k G For i>1, we have the foowing bound: kx i1 k 2 appe 2 = 4 appe k r=1 p m=1 " p appe R 2 m=1 w r i 2! 2 3 k (wi r ) m 5 r=1 R 2 max(w r r i ) 2 m p m=1 r=1 k (wi r ) 2 m k = R 2 kwi r k 2 2 r=1 # the above inequaity and (17) combine to give kx i1 k 2 appe p R k kwi r 1k 2. (19) k 19 r=1

20 Substituting this in (18) and noting that kx 11 k 2 appekxk 2 appe 1, we have kx ij k 2 appe 1+ p 1 R! kw G k p kw G k 1 i j k G G! =1+ p R kw G k k R p kw G k 1 G 1 If 1 R then the term in the parentheses is bounded by p k(1 + p 1 ), and if not, then it is bounded by p k(1 + 1 R ). This gives : kx ij k 2 appe 1+R + max (1,R 1 ). i j The foowing argument finishes the proof. Setting =1+R + max (1,R 1 ) 1. By construction, we have x = P P i j x ij. 2. Aso by construction, x ij kx ij k 2 2 C nc. 3. Now, etting ij = kx ijk 2,weshowed x = i j ij x ij kx ij k 2 4. We showed that P P i j ij appe 1, so that x can be written as a convex combination of x the ij kx ij k 2, which are eements in C nc. This means that x 2 conv(c nc). We then have the foowing bound for the mean width of C:!(C) 2 appe CR 2!(C nc ) 2. It now remains to compute!(c nc ). Lemma 12 yieds the desired resut. G (20) (21) Lemma 12 For C nc = {x : kxk appe1, kxk G,0 appe k, kw G k 0 appe k}, we have appe!(c nc ) 2 appe Ck og K + og k L We prove this in Appendix A. We make use of Lemma 10 to obtain the resut. We now proceed to prove Theorem 5. To do so, we adopt a di erent strategy than the one used to prove Theorem 6. Instead of considering the non convex idea set, we directy consider the convex set C and show that it is a subset of appropriatey scaed versions of the overapping group asso or the asso bas. The resut then foows. 20

21 Lemma 13 Consider the same set as that considered in Lemma 11. The mean width of the set can aso be shown to satisfy:!(c) 2 appe Ck min{og K + L, og (p)}. Proof Let g N (0, I), and for a given x, et{w G } 2 W(x) be its optima representation (Definition 4). Since x = P w G,wehave max x2c gt x = max x2c gt w G = max g T w G s.t. x = x2c = max g T w G s.t. (i) appe {w G }2W(x) max {w G }2W(x) = max (ii) = {w G }2W(x) g T w G g T w G s.t. s.t. p k(1 + 1 ) p + 1 max kg Gk 2 w G kw G k p kw G k 1 appe p k(1 + 1 ) (22) (1 + 1 p )kw G k 2 appe p k(1 + 1 ) p k(1 + 1 ) kw G k 2 appe ( p + 1 ) (23) appe p k (1 + 1 ) max kg Gk 2 (24) where we define g G to be the sub vector of g indexed by group G. (i) foows since the constraint set is a superset of the constraint in the expression above it, from the fact that kak 2 appekak 1 8a, and (ii) is a resut of simpe convex anaysis. The mean width is then bounded as!(c) appe p k (1 + 1 ) appe E max Gk 2. (25) Squaring both sides of (25), we get!(c) 2 appe k (1 + 1 ) 2 appee[max kg Gk 2 ] " (iii) appe k (1 + 1 ) 2 E max kg Gk 2 appe (iv) = k (1 + 1 ) 2 E max kg Gk 2 2 where (iii) foows from Jensen s inequaity and (iv) foows from the fact that the square of the maximum of non negative numbers is the same as the maximum of the squares. Now, 2 2 # 21

22 note that since g is Gaussian, kg G k 2 is a freedom. From Lemma 10, we have 2 random variabe with at most B degrees of!(c) 2 appe k (1 + 1 ) 2 ( p 2 og(k)+ p L) 2. (26) This gives us one of the two terms in the min{, } in the statement of the Lemma. Since is bounded away from 0, we can treat the term in the parenthesis as a constant. For the second term, et us revisit (22), and obtain the foowing inequaities: max x2c gt x = (v) appe max {w G }2W(x) max {w G }2W(x) = max {w G }2W(x) = max {w G }2W(x) g T w G g T w G g T w G g T w G p k(1 + 1 ) = p max + 1 p k(1 + 1 ) appe 1 max max g i i s.t. s.t. s.t. s.t. i2g (g G) i kw G k p kw G k 1 appe p k(1 + 1 ) 1 p L kw G k p kw G k 1 appe p k(1 + 1 ) p + 1 p kw G k 1 appe p k(1 + 1 ) kw G k 1 appe p k(1 + 1 ) p + 1 Where the constraint set in (v) is a superset of that in the statement above it. Again, after squaring both sides, taking expectations and appying Jensen s inequaity, The quantity inside the expectation is a Lemma 10, we obtain 1+ 2 appe!(c) 2 1 appe k E max gi 2 i 1 2 variabe with one degree of freedom, and from!(c) 2 appe Ckog (p). This gives the second term in the min{, }, and finishes the proof Lemma 13 and Theorem 2 ead directy to Theorem 5. The resuts in the proof above shed some more ight on our reguarizer h(x). If 1 = 0, then the probem reduces to that of cassification using the overapping group asso penaty, and we obtain the corresponding sampe compexity bound. For simpe sparsity without any structure, we woud want 1 to be arge, in which case 1+ 1! 1, and ( )!1. This woud then entai the bounds for the `1 reguarized probem taking over, keeping a other parameters k, K,, L fixed. 22

23 4.2 Extensions to Data with Correated Entries The resuts we proved above can be extended to data with correated Gaussian entries as we (see (Raskutti et a., 2010) for resuts in inear regression settings). Indeed, in most practica appications we are interested in, the features are expected to contain correations. For exampe, in the fmri appication that is one of the major motivating appications of our work, it is reasonabe to assume that voxes in the brain wi exhibit correation amongst themseves at a given time instant. This entais scaing the number of measurements by the condition number of the covariance matrix, where we assume that each row if the measurement matrix is samped from a Gaussian (0, ) distribution. Specificay, we obtain a generaization of the resut in (Pan and Vershynin, 2013) for the SOGasso with a correated Gaussian design. We now consider the foowing constraint set: 1 p C corr = {x : h(x) appe k(1 + 1 ), k 1 2 xkappe1}. (27) min( 1 2 ) We consider the set C corr and not C in (9), since we require the constraint set to be a subset of the unit Eucidean ba. In the proof of Coroary 14 beow, we wi reduce the probem to an optimization over variabes of the form z = 1 2 x, and hence we require k 1 2 xk 2 appe 1. Enforcing this constraint eads to the corresponding upper bound on h(x). We now obtain the foowing generaization of Theorem 5, for correated data Coroary 14 Let the entries of the data matrix be samped from a N (0, ) distribution. Suppose the measurements foow the mode in (1). Suppose we wish to recover a (k, ) group sparse vector from the set C corr in (27). Suppose the true coe cient vector x? satisfies k 1 2 x? k =1. Then, so ong as the number of measurements n satisfies n C 2 appe( )k min{og(k)+l, og(p)}, 2 the soution to (5) satisfies kˆx x? k 2 appe min( ). where min ( ), max( ) and appe( ) denote the minimum and maximum singuar vaues and the condition number of the corresponding matrices respectivey. We prove this resut in Appendix B. The proof is a straightforward modification of the proof of Theorem 5. A simiar resut aong the ines of Theorem 6 can aso be proved. 5. Appications and Experiments In this section, we perform experiments on both rea and toy data, and show that the function proposed in (8) indeed recovers the kind of sparsity patterns we are interested in in this paper. First, we experiment with some toy data to understand the properties of the function h(x) and in turn, the soutions that are yieded from the optimization probem (5). Here, we take the opportunity to report resuts on inear regression probems as we. We then present resuts using two datasets from cognitive neuroscience and computationa bioogy. 23

24 5.1 The SOGasso for Mutitask Learning The SOG asso is motivated in part by mutitask earning appications. The group asso is a commony used too in mutitask earning, and it encourages the same set of features to be seected across a tasks. As mentioned before, we wish to focus on a ess restrictive version of mutitask earning, where the main idea is to encourage sparsity patterns that are simiar, but not identica, across tasks. Such a restriction corresponds to a scenario where the di erent tasks are reated to each other, in that they use simiar features, but are not exacty identica. This is accompished by defining subsets of simiar features and searching for soutions that seect ony a few subsets (common across tasks) and a sparse number of features within each subset (possiby di erent across tasks). Figure 1 shows an exampe of the patterns that typicay arise in sparse mutitask earning appications, aong with the one we are interested in. We see that the SOGasso, with it s abiity to seect a few groups and ony a few non zero coe cients within those groups ends itsef we to the scenario we are interested in. In the mutitask earning setting, suppose the features are give by t, for tasks t = {1, 2,...,T}, and corresponding sparse vectors x? t 2 R p. These vectors can be arranged as coumns of a matrix?. Suppose we are now given M groups G = { G 1, G 2,...} with maximum size B. Note that the groups wi now correspond to sets of rows of?. Let x? =[x?t 1 x?t 2...x?T T ]T 2 R T p, and y =[y1 T y2 T...yT T ]T 2 R T n. We aso define G = {G 1,G 2,...,G M } to be the set of groups defined on R T p formed by aggregating the rows of that were originay in G, so that x is composed of groups G 2 G, and et the corresponding maximum group size be B = T B. By organizing the coe cients in this fashion, we can reduce the mutitask earning probem into the standard form as considered in (5). Hence, a the resuts we obtain in this paper can be extended to the mutitask earning setting as we Resuts on fmri dataset In this experiment, we compared SOGasso, asso, standard mutitask Gasso (with each feature grouped across tasks), the overapping group asso (Jacob et a., 2009) (with the same groups as in SOGasso) and the Eastic Net (Zou and Hastie, 2005) in anaysis of the star-pus dataset (Wang et a., 2003). 6 subjects made judgements that invoved processing 40 sentences and 40 pictures whie their brains were scanned in haf second intervas using fmri 8. We retained the 16 time points foowing each stimuus, yieding 1280 measurements at each voxe. The task is to distinguish, at each point in time, which kind of stimuus a subject was processing. (Wang et a., 2003) showed that there exists cross-subject consistency in the cortica regions usefu for prediction in this task. Specificay, experts partitioned each dataset into 24 non overapping regions of interest (ROIs), then reduced the data by discarding a but 7 ROIs and, for each subject, averaging the BOLD response across voxes within each ROI. With the resuting data, the authors showed that a cassifier trained on data from 5 participants generaized above chance when appied to data from a 6th thus proving some degree of consistency across subjects in how the di erent kinds of information were encoded. 8. Data and documentation avaiabe at 24

25 We assessed whether SOGasso coud everage this cross-individua consistency to aid in the discovery of predictive voxes without requiring expert pre-seection of ROIs, or data reduction, or any aignment of voxes beyond that existing in the raw data. Note that, unike (Wang et a., 2003), we do not aim to earn a soution that generaizes to a withhed subject. Rather, we aim to discover a group sparsity pattern that suggests a simiar set of voxes in a subjects, before optimizing a separate soution for each individua. If SOGasso can expoit cross-individua anatomica simiarity from this raw, coarsey-aigned data, it shoud show reduced cross-vaidation error reative to the asso appied separatey to each individua. If the soution is sparse within groups and highy variabe across individuas, SOGasso shoud show reduced cross-vaidation error reative to Gasso. Finay, if SOGasso is finding usefu cross-individua structure, the features it seects shoud aign at east somewhat with the expert-identified ROIs shown by (Wang et a., 2003) to carry consistent information. We trained the 5 cassifiers using 4-fod cross vaidation to seect the reguarization parameters, considering a avaiabe voxes without preseection. We group regions of voxes and considered overapping groups shifted by 2 voxes in the first 2 dimensions. 9 Figure 4 shows the prediction error (miscassification rate) of each cassifier for every individua subject. SOGasso shows the smaest error. The substantia gains over asso indicate that the agorithm is successfuy everaging cross-subject consistency in the ocation of the informative features, aowing the mode to avoid over-fitting individua subject data. We aso note that the SOGasso cassifier, despite being trained without any voxe pre-seection, averaging, or aginment, performed comparaby to the best-performing cassifier reported by Wang et a. (2003), which was trained on features average over 7 expert pre-seected ROIs To assess how we the custers seected by SOGasso aign with the anatomica regions thought a-priori to be invoved in sentence and picture representation, we cacuated the proportion of seected voxes faing within the 7 ROIs identified by (Wang et a., 2003) as reevant to the cassification task (Tabe 3). For SOGasso an average of 61.9% of identified voxes fe within these ROIs, significanty more than for asso, group asso (with or without overap) and the eastic net. The overapping group asso, despite returning a very arge number or predictors, hardy overaps with the regions of interest to cognitive neuroscientists. The asso and the eastic net make use of the fact that a separate cassifier can be trained for each subject, but even in this case, the overap with the regions of interest is ow. The group asso aso fares bady in this regard, since the same voxes are forced to be seected across individuas, and this means that the regions of interest which wi be misaigned across subjects wi not in genera be seected for each subject. A these drawbacks are circumvented by the SOGasso. This shows that even without expert knowedge about the reevant regions of interest, our method partiay succeeds in isoating the voxes that pay a part in the cassification task. We make the foowing observations from Figure 4 and Figure 5 The overapping group asso (Jacob et a., 2009) is i suited for this probem. This is natura, since the premise is that the brains of di erent subjects can ony be crudey aigned, and the overapping group asso wi force the same voxe to be seected across 9. The irreguar group size compensates for voxes being arger and scanner coverage being smaer in the z-dimension (ony 8 sices reative to 64 in the x- and y-dimensions). 25

26 Method Avg. Overap with ROI % OGasso ENet Lasso Gasso SOGasso Tabe 3: Mean Sparsity eves of the methods considered, and the average overap with the precomputed ROIs in (Wang et a., 2003) O Gasso E Net Lasso Gasso SOGasso Figure 4: Miscassification error on a hod out set for di erent methods, on a per subject basis. Each soid ine connects the di erent errors obtained for a particuar subject in the dataset. a individuas. It wi aso force a the voxes in a group to be seected, which is again undesirabe from our perspective. This eads to a high number of voxes seected, and a high error. The eastic net (Zou and Hastie, 2005) treats each subject independenty, and hence does not everage the inter-subject simiarity that we know exists across brains. The fact that a correated voxes are aso picked, couped with a highy noisy signa means that a arge number of voxes are seected, and this not ony makes the resut hard to interpret, but aso eads to a arge generaization error. The asso (Tibshirani, 1996) is simiar to the eastic net in that it does not everage the inter subject simiarities. At the same time, it enforces sparsity in the soutions, and hence a fewer number of voxes are seected across individuas. It aows any task correated voxe to be seected, regardess of its spatia ocation, and that eads to a highy distributed sparsity pattern (Figure 5(a)). It eads to a higher cross-vaidation error, indicating that the ungrouped voxes are inferior predictors. Like the eastic net, this eads to a poor generaization error (Figure 4). The distributed sparsity 26

27 pattern, ow overap with predetermined Regions of Interest, and the high error on the hod out set is what we beieve makes the asso a suboptima procedure to use. The group asso (Lounici et a., 2009) groups a singe voxe across individuas. This aows for taking into account the simiarities between subjects, but not the minor di erences across subjects. Like the overapping group asso, if a voxe is seected for one person, the same voxe is forced to be seected for a peope. This means, if a voxe encodes picture or sentence in a particuar subject, then the same voxe is forced to be seected across subjects, and can arbitrariy encode picture or sentence. This gives rise to a purpe haze in Figure 5(b), and makes the resut hard to interpret. The purpe haze manifests itsef due to the arge number of ambiguous voxes in Figure 5(d). Finay, the SOGasso as we have argued heps in accounting for both the simiarities and the di erences across subjects. This eads to the earning of a code that is at the same time very sparse and hence interpretabe, and eads to an error on the test set that is the best among the di erent methods considered. The SOGasso (Figure 5(c)) overcomes the drawbacks of asso and Gasso by aowing di erent voxes to be seected per group. This gives rise to a spatiay custered sparsity pattern, whie at the same time seecting a negigibe amount of voxes that encode both picture and sentences (Figure 5(d)). Aso, the resuting sparsity pattern has a arger overap with the ROI s than other methods considered. 5.2 Toy Data, Linear Regression Athough not the primary focus of this paper, we show that the method we propose can aso be appied to the inear regression setting. To this end, we consider simuated data and a mutitask inear regression setting, and ook to recover the coe cient matrix. We aso use the simuated data to study the properties of the function we propose in (8). The toy data is generated as foows: we consider T = 20 tasks, and consider overapping groups of size B = 6. The groups are defined so that neighboring groups overap (G 1 = {1, 2,...,6}, G 2 = {5, 6,...,10}, G 3 = {9, 10,...,14},...). We consider a case with M = 100 groups, We set k = 10 groups to be active. We vary the sparsity eve of the active groups and obtain m = 100 Gaussian inear measurements corrupted with Additive White Gaussian Noise of standard deviation = 0.1. We repeat this procedure 100 times and average the resuts. To generate the coe cient matrices?,weseectk groups at random, and within the active groups, ony retain fraction of the coe cients, again at random. The retained ocations are then popuated with uniform [ 1, 1] random variabes. The reguarization parameters were cairvoyanty picked to minimize the Mean Squared Error (MSE) over a range of parameter vaues. The resuts of appying asso, standard atent group asso (Jacob et a., 2009), Group asso where each group corresponds to a row of the sparse matrix, (Lounici et a., 2009) and our SOGasso to these data are potted in Figures 6(a), varying. Figure 6(a) shows that, as the sparsity within the active group reduces (i.e. the active groups become more dense), the overapping group asso performs progressivey better. This is because the overapping group asso does not account for sparsity within groups, 27

28 (a) Lasso (b) Group Lasso % of seected voxes PICTURE SENTENCE BOTH OGLASSO E NET LASSO GLASSO SOGLASSO (c) SOG Lasso (d) Voxe Encodings (%) Figure 5: [Best seen in coor]. Aggregated sparsity patterns across subjects per brain sice. A the voxes seected across subjects in each sice are coored in red, bue or purpe. Red indicates voxes that exhibit a picture response in at east one subject and never exhibit a sentence response. Bue indicates the opposite. Purpe indicates voxe that exhibited a a picture response in at east one subject and a sentence response in at east one more subject. (d) shows the percentage of seected voxes that encode picture, sentence or both. 28

29 LASSO GLASSO OGLASSO SOGLASSO MSE α (a) Varying (b) Sampe pattern Figure 6: Figure (a) shows the resut of varying. The SOGasso accounts for both inter and intra group sparsity, and hence performs the best. The Gasso achieves good performance ony when the active groups are non sparse. Figure (b) shows a toy sparsity pattern, with di erent coors and brackets denoting di erent overapping groups and hence the resuting soutions are far from the true soutions for sma vaues of. The SOGasso however does take this into account, and hence has a ower error when the active groups are sparse. Note that as! 1, the SOGasso approaches O-Gasso (Jacob et a., 2009). The Lasso (Tibshirani, 1996) does not account for group structure at a and performs poory when is arge, whereas the Group asso (Lounici et a., 2009) does not account for overapping groups, and hence performs worse than O-Gasso and SOGasso. 5.3 SOGasso for Gene Seection As expained in the introduction, another motivating appication for the SOGasso arises in computationa bioogy, where one needs to predict whether a particuar breast cancer tumor wi ead to metastasis or not, from gene expression profies. We used the breast cancer dataset compied by (Van De Vijver et a., 2002) and grouped the genes into pathways as in (Subramanian et a., 2005). To make the dataset baanced, we perform a 3-way repication of one of the casses as in (Jacob et a., 2009), and aso restrict our anaysis to genes that are ateast in one pathway. Again as in (Jacob et a., 2009), we ensure that a the repicates are in the same fod for cross vaidation. We do not perform any preprocessing of the data, other than the repication to baance the dataset. We compared our method to the standard asso, and the overapping group asso. The standard group asso (Yuan and Lin, 2006) is i-suited for this experiment, since the groups overap and the sparsity pattern we expect is a union of groups, and it has been shown that the group asso method wi not recover the signa in such cases. We trained a mode using 4-fod cross vaidation on 80% of the data, and used the remaining 20% as a fina test set. Tabe 4 shows the resuts obtained. We see that the SOGasso penaty eads to ower cassification errors as compared to the asso or the atent group asso. The errors reported are the ones obtained on the fina (hed out) test set. We refrain from performing simpe ridge regression (Hoer and Kennard, 1970) since the 29

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR 1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework

Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework Limits on Support Recovery with Probabiistic Modes: An Information-Theoretic Framewor Jonathan Scarett and Voan Cevher arxiv:5.744v3 cs.it 3 Aug 6 Abstract The support recovery probem consists of determining

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance Efficient Simiarity Search across Top-k Lists under the Kenda s Tau Distance Koninika Pa TU Kaisersautern Kaisersautern, Germany pa@cs.uni-k.de Sebastian Miche TU Kaisersautern Kaisersautern, Germany smiche@cs.uni-k.de

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0 Bocconi University PhD in Economics - Microeconomics I Prof M Messner Probem Set 4 - Soution Probem : If an individua has an endowment instead of a monetary income his weath depends on price eves In particuar,

More information

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE KATIE L. MAY AND MELISSA A. MITCHELL Abstract. We show how to identify the minima path network connecting three fixed points on

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

Chemical Kinetics Part 2

Chemical Kinetics Part 2 Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations Goba Optimaity Principes for Poynomia Optimization Probems over Box or Bivaent Constraints by Separabe Poynomia Approximations V. Jeyakumar, G. Li and S. Srisatkunarajah Revised Version II: December 23,

More information

Ensemble Online Clustering through Decentralized Observations

Ensemble Online Clustering through Decentralized Observations Ensembe Onine Custering through Decentraized Observations Dimitrios Katseis, Caroyn L. Bec and Mihaea van der Schaar Abstract We investigate the probem of onine earning for an ensembe of agents custering

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

https://doi.org/ /epjconf/

https://doi.org/ /epjconf/ HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY

DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY SIAM J. OPTIM. Vo. 7, No. 3, pp. 1637 1665 c 017 Society for Industria and Appied Mathematics DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY MIJU AHN, JONG-SHI PANG,

More information

The Group Structure on a Smooth Tropical Cubic

The Group Structure on a Smooth Tropical Cubic The Group Structure on a Smooth Tropica Cubic Ethan Lake Apri 20, 2015 Abstract Just as in in cassica agebraic geometry, it is possibe to define a group aw on a smooth tropica cubic curve. In this note,

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

Mixed Volume Computation, A Revisit

Mixed Volume Computation, A Revisit Mixed Voume Computation, A Revisit Tsung-Lin Lee, Tien-Yien Li October 31, 2007 Abstract The superiority of the dynamic enumeration of a mixed ces suggested by T Mizutani et a for the mixed voume computation

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

Generalized multigranulation rough sets and optimal granularity selection

Generalized multigranulation rough sets and optimal granularity selection Granu. Comput. DOI 10.1007/s41066-017-0042-9 ORIGINAL PAPER Generaized mutigranuation rough sets and optima granuarity seection Weihua Xu 1 Wentao Li 2 Xiantao Zhang 1 Received: 27 September 2016 / Accepted:

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Chemical Kinetics Part 2. Chapter 16

Chemical Kinetics Part 2. Chapter 16 Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

Efficient Generation of Random Bits from Finite State Markov Chains

Efficient Generation of Random Bits from Finite State Markov Chains Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

Convolutional Networks 2: Training, deep convolutional networks

Convolutional Networks 2: Training, deep convolutional networks Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

A Separability Index for Distance-based Clustering and Classification Algorithms

A Separability Index for Distance-based Clustering and Classification Algorithms IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 1 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Coded Caching for Files with Distinct File Sizes

Coded Caching for Files with Distinct File Sizes Coded Caching for Fies with Distinct Fie Sizes Jinbei Zhang iaojun Lin Chih-Chun Wang inbing Wang Department of Eectronic Engineering Shanghai Jiao ong University China Schoo of Eectrica and Computer Engineering

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

Learning Fully Observed Undirected Graphical Models

Learning Fully Observed Undirected Graphical Models Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

Lasso and probabilistic inequalities for multivariate point processes

Lasso and probabilistic inequalities for multivariate point processes Submitted to the Bernoui arxiv: arxiv:128.57 Lasso and probabiistic inequaities for mutivariate point processes NIELS RICHARD HANSEN 1, PATRICIA REYNAUD-BOURET 2 and VINCENT RIVOIRARD 3 1 Department of

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

Optimal Control of Assembly Systems with Multiple Stages and Multiple Demand Classes 1

Optimal Control of Assembly Systems with Multiple Stages and Multiple Demand Classes 1 Optima Contro of Assemby Systems with Mutipe Stages and Mutipe Demand Casses Saif Benjaafar Mohsen EHafsi 2 Chung-Yee Lee 3 Weihua Zhou 3 Industria & Systems Engineering, Department of Mechanica Engineering,

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Rate-Distortion Theory of Finite Point Processes

Rate-Distortion Theory of Finite Point Processes Rate-Distortion Theory of Finite Point Processes Günther Koiander, Dominic Schuhmacher, and Franz Hawatsch, Feow, IEEE Abstract We study the compression of data in the case where the usefu information

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

New Efficiency Results for Makespan Cost Sharing

New Efficiency Results for Makespan Cost Sharing New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Theory and implementation behind: Universal surface creation - smallest unitcell

Theory and implementation behind: Universal surface creation - smallest unitcell Teory and impementation beind: Universa surface creation - smaest unitce Bjare Brin Buus, Jaob Howat & Tomas Bigaard September 15, 218 1 Construction of surface sabs Te aim for tis part of te project is

More information

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR Pierric Bruneau, Marc Gegon and Fabien

More information

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering

More information

Substructuring Preconditioners for the Bidomain Extracellular Potential Problem

Substructuring Preconditioners for the Bidomain Extracellular Potential Problem Substructuring Preconditioners for the Bidomain Extraceuar Potentia Probem Mico Pennacchio 1 and Vaeria Simoncini 2,1 1 IMATI - CNR, via Ferrata, 1, 27100 Pavia, Itay mico@imaticnrit 2 Dipartimento di

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

6 Wave Equation on an Interval: Separation of Variables

6 Wave Equation on an Interval: Separation of Variables 6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.

More information

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

arxiv: v2 [stat.ml] 19 Oct 2016

arxiv: v2 [stat.ml] 19 Oct 2016 Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv:1407.4543v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University ye@stanford.edu Abstract Trevor Hastie Department of

More information

Homogeneity properties of subadditive functions

Homogeneity properties of subadditive functions Annaes Mathematicae et Informaticae 32 2005 pp. 89 20. Homogeneity properties of subadditive functions Pá Burai and Árpád Száz Institute of Mathematics, University of Debrecen e-mai: buraip@math.kte.hu

More information