STATISTICAL APPROACHES FOR MATCHING THE COMPONENTS OF COMPLEX MICROBIAL COMMUNITIES

Size: px
Start display at page:

Download "STATISTICAL APPROACHES FOR MATCHING THE COMPONENTS OF COMPLEX MICROBIAL COMMUNITIES"

Transcription

1 STATISTICAL APPROACHES FOR MATCHING THE COMPONENTS OF COMPLEX MICROBIAL COMMUNITIES by Chongci Tang Submitted in partia fufiment of the requirements for the degree of Master of Science at Dahousie University Haifax, Nova Scotia May 2016 c Copyright by Chongci Tang, 2016

2 Tabe of Contents List of Tabes iv List of Figures v Abstract List of Abbreviations and Symbos Used vi vii Acknowedgements viii Chapter 1 Introduction Background BiomeNet The chaenge of appying BiomeNet to rea data Structure of the thesis Chapter 2 Methods for matching the metaboic components of compex microbia communities LASSO Regression Review of the method An exampe of LASSO fitting NNLS Regression Review of the method An exampe of NNLS fitting A method based on minimizing JSD The method An exampe of JSD minimization method Summary Chapter 3 Resuts for matching the metaboic components of compex microbia communities RSS and JSD of a the matches Quaity of the matches ii

3 3.3 Heatmaps of the estimated coefficient matrix Chapter 4 Discussion and suggestions for future work Bibiography iii

4 List of Tabes 1.1 Mathematica definition of the pate diagram The first 30 steps of the LASSO path on Y The updated parameters β for MJSD method on Y ˆβj, RSS, JSD of three methods on Y The number of the significant matches among a 100 matches. 37 iv

5 List of Figures 1.1 Pate diagram for the BiomeNet mode The Change of coefficients through first 15 steps of LASSO path on Y LASSO coefficients of the seected best LASSO mode on Y Change of RSS for different threshoding modes of Y The coefficients of the NNLS mode of Y 1 after threshoding JSD of Y 1 to a 200 X variabes Coefficients of the variabes resuted from the MJSD method on Y The positive reactions of a the 11 X variabes Some y s and ŷ s of three methods on Y RSS and JSD of a 100 matches for three methods Some y s and ŷ s of three methods on Y Some y s and ŷ s of three methods on Y RSS and JSD permutation histogram of Y 49 for the three methods RSS and JSD permutation histogram of Y 30 for the three methods The number of positive coefficients of a 100 matches for three methods Heatmap of the ordered coefficients for the MJSD v

6 Abstract A hierarchica Bayesian mode caed BiomeNet can be used to identify the functiona units of metaboic reactions, subnetworks and community-eve metaboic networks. The framework modes metaboic structures by assuming each sampe consists of many tighty connected subnetworks, which in turn are comprised of different reactions. When appying the method the number of subnetworks L must be pre-specified When L is set arger in BiomeNet, the inferred structures of the subnetwoks are expected to come out in a more trivia form. Three methods, LASSO, NNLS and a new method, MJSD, are appied to match a subnetwork in one anaysis (say, when L=100) with severa subnetworks when L is increased (say, L=200). RSS and JSD are appied as matching criteria to conduct mutipe tests to judge the significance of the matches. From the resuts, I am abe to identify those predominant subnetworks and give a reasonabe conjecture that those predominant subnetworks aways come out as unbroken bocks for any arger L vaues. vi

7 List of Abbreviations and Symbos Used Symbos and Abbr. X Y Y j X j y i β JSD RSS NNLS LASSO LDA S-R LAR df MCMC PCA IBD KEGG MJSD FDR B-H procedure Description 2824 by 200 data matrix 2824 by 100 data matrix j th coumn in Y j th coumn in X i th eement in Y coefficients vector Jensen Shannon divergence Residua sum of squares Non-negative east square east absoute shrinkage and seection operator Latent Dirichet Aocation substrate-product east ange regression degrees of freedom Markov chain Monte Caro Principe Component Anaysis Infammatory bowe disease Kyoto Encycopedia of Genes and Genomes the new method based on minimizing JSD Fase Discovery Rate Benjamin-Hochberg procedure vii

8 Acknowedgements I woud ike to express my huge gratitude to my supervisor, Dr. Hong,Gu and cosupervisor Dr. Joseph Bieawski, for their guidance and encouragement for the two years. They spent a ot of time on instructing me and revise my thesis even though they are reay busy. Without their support this thesis woud not have been possibe. I woud aso ike to thank my committee members, Dr. Toby Kenney and Dr. Bruce Smith for their kind comments and advice on my thesis. I woud ike to give my thanks to my friends and everyone ese who heped me. My thanks aso goes to my professors and cassmates in Dahousie University. viii

9 Chapter 1 Introduction 1.1 Background From protists to humans, a pants and animas ive in cose association with microbia organisms. These microbes are beieved to pay a critica roe in a wide variety of settings, from gobay significant nutrient cycing to infuencing human physioogy [16]. The ecoogica community of commensa, symbiotic and pathogenic microorganisms that iteray share the human body space have a profound effect on human heath [11]. For this reason, in human microbiomics, research is focused on the reationship of the microbia communities to both heath and disease status. In ate 1990s, researchers found that the microbiome in the gut payed a roe in the deveopment and maintenance of the human immune system, and the human microbiome is now impicated in a variety of autoimmune diseases ike diabetes [10] [22]. A poor mix of microbes in the gut may aso aggravate common conditions such as obesity [20]. The consensus opinion is that athough the mammaian immune system seems to be designed to contro microorganisms, it is in fact controed by microorganisms [12]. Most microbes in most environments are not cutivatabe, so studies of microbia communities must be made based on DNA samped directy from the environment, athough RNA, protein and metaboite based studies may aso be performed [6]. There are two ways to use environmenta DNA to study microbia communities. One is targeted ampicon studies (e.g., 16S SSU rdna), which focuses on a known marker gene and is primariy informative about taxonomy [21]. The other that appeared more recenty is shotgun metagenomics, which can be used to study the functiona potentia of the community [21]. Shotgun sequencing of tota environmenta DNA proceeds by randomy cutting the tota DNA within a sampe, and sequencing the many short fragments that are produced [14]. The resuting coection of DNA sequences represents the genomes of a the microbes in the sampe, and is referred to as the metagenome of the samped microbiota. The function of some of those 1

10 2 sequences can be inferred via homoogy to reference sequence with known functions (e.g., through the KEGG database) [14]. However, ony a fraction of the gene sequences can be assigned a function; the function of a very arge number of sequences, wi remain unknown [14]. Of particuar interest are those sequences that encode a known enzyme, as they can be used to infer the community-eve metaboic potentia of the microbiota. The compete set of metaboic and physica processes of a ce, caed a metaboic network, determines the physioogica and biochemica properties of the bacterium. The fundamenta units of a metaboic network are its chemica reactions. The essentia components of a reaction are the substrates (chemica compounds), the enzyme (a protein encoded by DNA) that wi act upon the substrates, and the products that are produced by the action of the enzyme on its substrates (other chemica compounds). It is important to note that the products of one enzyme-mediated reaction can serve as the substrate for a different enzyme-mediated reaction. Such chemica reactions are organized into subnetworks (functiona modues), which are themseves organized into the higher-eve structures that bioogists refer to as the metaboic network. In the fied of microbiomics, the notion of the metaboic network is extended to incude the fu metaboic capacity, and the fu set of interactions, carried out by a community of microbes. In order to infer differentia usage of metaboic networks among ecoogicay divergent microbia communities, a Bayesian modeing approach caed BiomeNet was deveoped by Shafiei et a. [16]. The modeing framework differs from previous approaches [e.g., PCA and its variants] in attempting to capture the hierarchica nature of rea metaboic networks [22]. Specificay, sets of reactions (reated by expoiting a shared poo of substrates and products) are modeed as a metaboic subnetwork, and over-apping mixtures of such metaboic subnetworks are used to mode the fu metaboic network [16]. When such networks are inferred from the mode (as opposed to being defined soey according to biochemica expertise) they are caed metabosystems [16]. Metabosystems are intended to represent hypothetica metaboic phenotypes, and sampes are permitted to consist of overapping mixtures of metabosystems.

11 3 Currenty, the most widey used method to anayze community metaboic networks, and metagenomic variation in genera, is Principe Component Anaysis (PCA) and it variants [16]. BiomeNet is fundamentay different from such methods, in providing a framework to take advantage of dependencies between reactions encoded by metaboic networks without any transformation and reduction. This means that resuts obtained under BiomeNet offer bioogists a direct functiona interpretation. Thus, BiomeNet contributes a vauabe addition to PCA [16]. BiomeNet has been appied to a variety of datasets, incuding marine water sampes, mammaian gut microbiomes and the gut microbiomes of infammatory bowe disease (IBD) patients [16]. The case of IBD iustrates why a direct functiona interpretation of the resuts is critica to microbiome researchers. IBD is a human disease characterized by chronic immune dysreguation in the gut. Using BiomeNet, Shafiei et a [16] found that IBD patients differed from heathy individuas according to the mixture of metabosystems found in their gut. Interestingy, the metabosystem having a greater prevaence among IBD patients was characterized by metaboic reactions associated with (i) cose association with the human gut epitheium, (ii) resistance to dietary intervention, and (iii) interference with uptake of antioxidants connected to IBD [16]. IBD is a serious disease; it can cause severey disruptive pain, require surgery, and even cause increased mortaity. Even more pressing is that it is on the increase wordwide, with Canada having among the highest rates. Thus, it is important that BiomeNet produces resuts that have a direct reationship to microbia metaboic capacity, and the posterior mixture weights have a straightforward interpretation. Inference under BiomeNet wi be focus of this thesis, and the framework is described in more detai in the next section. 1.2 BiomeNet BiomeNet is a hierarchica mixed-membership Bayesian mode simiar to Latent Dirichet Aocation (LDA). LDA empoys a mixture of Dirichet priors to faciitate custering, or cassification. Standard LDA, however, cannot be used to resove the underying structure of a metaboic network in terms of easiy interpretabe parts such as metaboic pathways or subnetworks. In the metagenomic setting, the observed data (produced by shotgun sequencing) is the abundance of enzyme-encoding sequence

12 4 reads in each sampe; but, for input into BiomeNet the data must be converted to counts of substrate-product pairs. To do this, where possibe, each enzyme-encoding sequence is assigned to its enzyme within the EC database. The EC database provides information about the biochemica reaction that the enzyme catayzes, incuding a of the substrates and products for that reaction. Each reaction is decomposed into a possibe substrates and products (this eads to a mini hyper-graph for each reaction), and a pairs of compounds are assigned the abundance of the associated enzymeencoding sequence within the sampe. As mammaian gut metagenomes can encode thousands of unique enzymes, the input data for BiomeNet is rich in information reevant to metaboic interactions. BiomeNet is used to mode metaboic interactions at the community-eve. The mode assumes that each microbiome sampe is a mixture of K metabosystems, where K is assumed to be known and fixed. However, the contribution of the K metabosystems is permitted to differ between sampes. Each metabosystem is itsef viewed as a mixture of a fixed number (L) of metaboic subnetworks. Hence, metabosystems differ according to their particuar mixture of subnetworks, with the subnetworks viewed as a mixture of metaboic reactions. Because reactions within a subnetwork are inked through shared chemica compounds, the subnetworks are actuay modeed as a subset of compounds that are converted to another subset of compounds. This is why each enzymatic reaction must be decomposed into substrate-product pairs (S-R pairs), with each subnetwork having its own substrates and products. The ower eves of the hierarchy (e.g., reactions) can aways contribute to any of the higher eves (e.g., to different subnetworks, metabosystem and sampes) to different degrees [16]. The set of S-R pairs for each reaction in a sampe is the ony observabe variabe, with other variabes (the subnetwork (Y ) and metabosystem assignments (Z) of the reactions) being atent variabes. In order to capture the dependence among the many variabes more concisey, a pate diagram is shown in Figure 1.1 with the mathematica definitions given in Tabe 1.1. The outer pate represents the tota data, the midde pate represents sampes, and the inner pate represents reactions within a sampe. The reative contribution of each metabosystem to the n th microbiome sampe is modeed via the variabe θ n, which is a probabiity vector of K vaues that sum to one. Typicay, a

13 5 Figure 1.1: Pate diagram for the BiomeNet mode. This mode specifies a generative process; couping between substrate-product pairs is enforced by conditioning their generation on a singe subnetwork membership [16]. Note: Adapted from BiomeNet, A bayesian mode for inference of metabic divergence among mibrobia communities by Mahdi Shafei, Katherine A Dunn. PLoS Comput Bio, 10(11):e , sparse symmetric Dirichet priori is paced on θ. θ Dirichet(α θ ) The unique mixture of L subnetworks for each metabosystem is modeed with a probabiity vector, ϕ k, of mixing probabiities that sum to one. Thus, given K metabosystems, a K L matrix caed ϕ is used to represent the reative contribution of the th subnetwork to the k th metabosystem. Typicay, a sparse symmetric Dirichet prior is paced on ϕ. ϕ k Dirichet(α ϕ )

14 6 Because reactions are inked through shared chemica compounds, each subnetwork has its own substrate (S) and product (R) groups. Note that the products of one reaction can serve as the substrate of another reaction (i.e., the intermediary compounds in a metaboic pathway), and the membership of compounds to the S and R groups must be soft (probabiistic) rather than discrete. Thus, for L subnetworks, there wi be L substrate and products groups. Each subnetwork has a vector of C compounds for its own substrate and products, denoted by δ and γ respectivey, summing to one. For L subnetworks, there wi be an L C matrix, δ, for substrate compounds and another L C matrix, γ, for product compounds. A vaue in row and coumn c of one of these matrices gives the contribution of compound c to a subnetwork (as either a substrate or product, depending on the matrix). As above, a sparse symmetric Dirichet prior is paced on δ and γ. δ Dirichet(α δ ) γ Dirichet(α γ ) Variabe Type Meaning N integer number of microbiome sampe(e.g 38) K integer number of metabosystems (e.g 3) L integer number of subnetworks (e.g 100) C integer number of compounds (e.g 2713) θ probabiity prior distribution of metabosystems in a sampe ϕ probabiity prior distribution of subnetworks in metabosystems δ probabiity prior distribution of substrate compounds in subnetworks γ probabiity prior distribution of product compounds in subnetworks α probabiity concentration parameter of the Dirichet distribution n integer index of a sampe i integer index of a reaction j integer index of a substrate-product pairs Z ni vector metabosystem assignment for the i th reaction in the n th sampe Y ni vector subnetwork assignment for the i th reaction in the n th sampe S nij vector substrate compounds assigned for the J in S-R pairs R nij vector product compounds assigned for the J in S-R pairs Tabe 1.1: Mathematica definition of the pate diagram

15 7 Whie the prior probabiities are in the Dirichet distributions, the posterior distributions are in a mutinomia distribution. The reative contribution of each metabosystem and subnetwork to the sampe n is modeed as Z ni θ n Muti(θ n ) Y ni Z ni Muti(ϕ zni ) Where Z ni and Y ni denote the metabosystem and subnetwork assignments for reaction i in sampe n. Inference under BiomeNet invoves samping the posterior distribution of the subnetwork and metabosystem assignments for each reaction in the data and integrating out a other atent variabes. Specificay, coapsed Gibb samping is used to sampe from the conditiona distribution P (Z, Y R, S, α θ, α ϕ, α δ, α γ ), with the atent variabe θ, ϕ, δ, γ integrated out. The subnetwork and metabosystem assignments are samped for a singe reaction in a singe sampe, P (Z, Y S, R), given the set of subnetwork and metabosystem assignment for a other reactions in a other sampes except ony reaction i in sampe n (Y ni and Z ni respectivey). Each iteration of Gibb samping not ony provides a joint posterior distribution samping point on P (Z, Y S, R), but aso provide a samping point from a margina distribution P (Z ni, Y ni S, R). Based on the posterior distribution of metabosystems and subnetworks assignments, P (Z S, R) and P (Y S, R), the posterior distribution of θ and ϕ, and their means, can be directy inferred if sufficient iterations of Gibbs samping have been carried out [16]. 1.3 The chaenge of appying BiomeNet to rea data The strength of BiomeNet is that it provides a method to earn the community-eve structure of the data in terms of subnetworks and metabosystems, rather than according to curated pathways, or some other bioogy-centered definitions. Mode-derived structures hep researchers to investigate the atent metaboic structure of any microbia community, and to understand microbia community ecoogy, without having to assume that metaboic pathways have been previousy defined and annotated in a way that is suitabe to the communities under study. However, BiomeNet does have imitations, and further deveopment of the anaytica framework is warranted. The number of metabosystems K in each sampe, and the number of subnetworks L

16 8 must be pre-specified [16]. Without strong bioogica criteria of seecting the vaues of K and L, the user of BiomeNet must either (i) try different number of K and L and compare the discrepancy to choose a reasonabe vaues, or (ii) choose arbitrariy high vaues for K and L and set the concentration parameter of the Dirichet priors cose to zero as a means of pushing BiomeNet to characterize metabosystems by a reativey few predominant subnetworks and reactions. The atter is a strategy for minimizing variance and maximizing interpretabiity in the face of high vaues for K and L. The probem with both strategies is that there is no objective means of assessing which, if any, subnetworks within a metabosystem, or reactions within a subnetwork, warrant predominant status and further bioogica interpretation. A metagenomic dataset comprised of 38 mammaian gut microbiomes [16] provides both a good iustration of the chaenges, as we as a good test case of future method deveopment. This dataset is rich in metaboic information, having 2824 unique reactions invoving 2713 compounds. In the origina study, Shafiei et a [16], set K= 3 metabosystems, to match the number of dietary niches (carnivore, omnivore and herbivore) represented by the samped mammas. However, the authors found that they coud ony separate the carnivores from the herbivores, and concuded that the there was strong signa for 2 metabosystems, and ony weak signa for a third. Shafiei et a. [16] had no a priori basis to seect a good vaue for L, so they experimented with different numbers of subnetworks (i.e., L=50, 100, 150, 200) and assessed the robustness of the reaction composition of the metabosystems to L. [14]. Athough their resuts suggested that a good vaue of L was ikey somewhere between 100 and 150, they provided no means of objectivey assessing which were the most important subnetworks. When L is arger than needed, there wi be redundant subnetworks in BiomeNet. The redundant subnetworks wi carry very sma weight for many of the metabosystems and their reactions wi have ony a trivia impact in composition of each metabosystem. Even when the concentration parameter of the Dirichet priors are cose to zero, and BiomeNet has been encouraged to use reativey few predominant subnetworks and reactions, it remains uncear how to decide at what point the mixture weight of a given metabosystem shoud be considered trivia. Thus the motivation of my research is to address this chaenge.

17 9 I empoyed the mamma dataset described above to deveop and investigate severa aternative methods for discriminating the so-caed predominant subnetworks from those making ony trivia contributions to a metabosystem. The ogic that underpins my approach is that the informative subnetworks shoud make consistent contributions to each metabosystem across aternative vaues of L (as ong as L is not too sma), and that matching the components (i.e., reactions) of subnetworks can be used to identify them. However, comparison of resuts across aternative vaues of L is far from straightforward. First, subnetworks are unikey to be abeed in the same way. For exampe, subnetwork 10 under L=100 might correspond to subnetwork 33 under L=150. Second, the signa for a subnetwork in one anaysis (say, when L=100) coud be spit among severa subnetworks when the vaue for L is increased (say, when L = 200). Thus, I investigated mode-based methods for matching subnetworks from one case as inear combinations of subnetworks derived from another case. This strategy overcomes the probem of potentiay compex reationships between the components of subnetworks derived from different anayses, and it has no requirements about abeing of subnetworks among the different anayses. To achieve this, I summarize the information about the components of each subnetwork in a N L reaction matrix. Here, N is the number of unique reaction in the dataset, which is 2824 for the 38 mammaian gut metagenomes. Each coumn in the N L matrix represents a reaction s posterior mixing probabiity to the corresponding subnetwork [16] (detaied desciption of the reaction matrix in Chapter 2). In the thesis, I first evauate two cassica methods, LASSO (east absoute shrinkage and seection operator) and NNLS (non-negative east squares) regression, for matching resuts structured as described above. I then describe a new method simiar to forward seection for matching the components of subnetworks. The effectiveness of these three methods are compared, and I show that the new method is more suitabe in the case of the 38 mammaian gut metagenomes. 1.4 Structure of the thesis This thesis is comprised of four chapters. The next two chapters (2 and 3) are devoted to method deveopment and appication to rea data, In Chapter 2, I introduce two cassica methods (LASSO and NNLS) and the new method, and describe how they

18 10 can be appied to the probem of matching the metaboic components of compex microbia communities. As an exampe, I match one subnetwork from the anaysis of L=100 to the subnetworks from that of L=200 using the three methods. In Chapter 3, these three methods are appied to a the 100 subnetworks in the L = 100 run of BiomeNet to match the subnetworks in the L= 200 run. I use RSS and JSD as criteria to rank the good matches and give exampes of a good and a bad match according to those criteria. I then discriminate those we matched subnetworks by using permutation test and present the estimated coefficient matrices to find the features of the good matches. Chapter 4 concudes with the strengths and weaknesses of a three methods and provides directions for future research.

19 Chapter 2 Methods for matching the metaboic components of compex microbia communities As introduced in Chapter 1, the strength of BiomeNet is that it provides a method to earn the community-eve structure of the data in terms of subnetworks and metabosystems. One way to find if BiomeNet output is consistent with the information on the true community-eve structure is to match the subnetworks to find whether the informative subnetworks aways can be identified across aternative vaues of L when K is fixed. Here K is fixed at 3 as this was the vaue empoyed in the origina study of Shafiei.et.a [16]. If BiomeNet works we, the signa for a subnetwork in one anaysis (say, when L=100) coud be either the same or spit among severa subnetworks when the vaue for L is increased (say, L=200). Because subnetworks are unikey to be abeed in the same way, some method is needed to identify the reationships between subnetworks derived from different appications of BiomeNet to the same data. The reationships coud be either a one to one correspondence, or a subnetwork from one case being spit into severa subnetworks in another case. Each subnetwork is represented by a coumn of the posterior mixing probabiities in the N L reaction matrix, where N=2824 is the number of unique reactions for the 38 mammaian gut metagenomic data and L is the number of subnetworks. Denote the N L 1 reaction matrix from one run of BiomeNet as Y, and the N L 2 reaction matrix from different run of BiomeNet as X, the matching probem that one subnetwork in Y corresponds to one or more subnetworks in X can be represented mathematicay as foows: Y = Xβ = β 1 X 1 + β 2 X β L2 X L2 Thus the matching probem naturay can be formuated as a regression without intercept, in which case minimization of the sum of squared residuas can be used as the criterion to achieve the best fit. Matching subnetworks from one case as 11

20 12 inear combinations of another case shoud perform we because the expectation is that a arge subnetwork in L=100 wi be spit into severa smaer (and argey nonoverapping) subnetworks when L=200. Thus the inear regression mode is chosen to represent the reationship between subnetworks. Usuay, the east square inear regression coefficient vector is not sparse. However, in this appication one subnetwork is expected to be matched by either one or ony a few subnetworks, i.e. the consistency of the subnetwork outputs from BiomeNet is associated to the sparsity and the quaity of the matches. So, a reguarized version of the east squares soution may be preferabe to satisfy the sparsity of β. Naturay, LASSO regression is a good candidate method since it can resut in a good regression mode with a smaer subset of predictors for LASSO fitting. A modified LARS agorithm [11] can be used to return a fu LASSO path, and a sparse soution can be achieved with a proper mode seection criterion. If a subnetwork in Y is spit into severa subnetworks in X, then ideay the regression coefficients shoud be non-negative which means β 0. LASSO regression ony satisfies the requirement of sparsity of β, but not the non-negativity of β. Thus another good candidate method is the NNLS regression which assures the non-negativity of the coefficients. The NNLS soution is not ony non-negative, it can aso be sparse to some extent, and further sparsity can be achieved by a proper threshoding agorithm. Furthermore, since each coumn in matrices X and Y represents the posterior mixing probabiity over a reactions, the sum of a eements in a coumn shoud be 1. If one coumn can be represented by a inear combination of severa other coumns, i.e., Y =Xβ, thus 1 = 1 T Y = 1 T Xβ = 1 T β. This impies another constraint on β. Both LASSO and NNLS don t necessariy output the soution that stricty satisfy this third constraint for β. As a measure of the quaity of matching of two mutinomia probabiity distributions, residua sum of squares is not the most proper criterion. A natura criterion for this purpose is the Jensen-Shannon Divergence. Thus a new method is deveoped by directy optimizing the Jensen-Shannon divergence between the target vector Y and a sparse non-negative inear combination of X such that the coefficients sum to 1. In the rest of this chapter, I first review the methods of LASSO and NNLS and

21 13 then describe our newy deveoped method which is more suitabe for this appication. Foowing the review of each method, I show an exampe of its performance on matching a subnetwork from L = 100, denoted as Y 1, with subnetworks from L=200 obtained from the mammaian metagenomics data by using BiomeNet. 2.1 LASSO Regression Review of the method Given a set of input measurements X 1, X 2,..., X L and an outcome measurement Y, the LASSO [4] regression fits a inear mode Y = β 1 X 1 + β 2 X β L X L = Xβ with a constraint on β, the L 1 norm of the parameter vector, not greater than a given vaue. The criterion is: argmin β subject to N (y i i=1 L x ij β j ) 2 j=1 L β j s j=1 The bound s is a tuning parameter [2]. Of course when s is arge enough, the constraint has no effect, and the soution of LASSO regression is just the same as the east square inear regression. The probem can be equivaenty represented as an unconstrained minimization probem with the L 1 norm penaty λ β added as foowing: argmin 1 2 N L L (y i x ij β j ) 2 + λ β j i=1 j=1 j=1 An aternative reguarized version of east squares is Ridge regression [4], which uses the constraint that β 2, the L 2 -norm of the parameter vector, is no greater than a given vaue. Here, LASSO regression is preferred to Ridge regression, since with the penaty λ increasing, a parameters of Ridge regression are shrunk towards 0, but not exacty equa to 0; whie in LASSO more parameters are shrunk to zero which

22 14 resuts in a sparse soution [2]. Since our purpose is to find severa subnetworks that exhibit the strongest inear reation with the target subnetwork, sparsity is desirabe. Because the LASSO oss function is not quadratic, the optimization probem can not be soved using genera convex optimization methods. Here the entire LASSO path is cacuated by a modified form of the LARS (east ange regression) agorithm [2]. The R package ars is used to sove the LASSO fitting. By defaut in the package, each variabe is standardized to have unit L 2 -norm. In this appication, the variabes are mutinomia probabiities that sum to 1; so, it s not proper to standardize the variabes. The soution path is piecewise inear there are a finite number of the points at which the regression projection changes its direction [2]. The sparsity of the soution is controed by the reguarization parameter λ, which in genera is chosen by a cross-vaidation procedure on training data. In our appication, it is not proper to perform the cross-vaidation, because both responses and predictors are mutinomia distribution probabiities. The sparsity of these vectors makes the variance of the cross-vaidation resuts very arge and it is not proper to ignore the property that the sum of these vectors are a 1 s. An aternative method to cross-vaidation for choosing the tuning parameter is to use Maow s Cp as the mode seection criterion [2]. A smaer Cp vaue indicates a better mode. The Cp criterion is defined as. where ŷ i(k) is the predicted vaue of the ith observation y i from the mode with k regressors. size. SSE k = Cp = SSE k MSE p n + 2k n (y i ŷ i(k) ) 2 is the sum of squares for the mode with k predictors. i=1 MSE p is the mean square error on the compete set of p regressors. n is the sampe The compexity of the modes in the LASSO path can be represented by their degrees of freedom. The number of degrees of freedom for a inear mode is defined as the true dimension of the inear subspace in which the predictors of the mode ie. However defining the number of predictors in the LASSO mode is ony an approximation because each dimension shoudn t be counted as a fu degree of freedom due

23 15 to the shrinkage appied [2]. The approximation works due to a usefu concusion in [19] that when fitting a inear mode via LASSO stopping at some number of steps k<p, the df of the modified LAR procedure at any stage, is approximatey equa to the number of predictors in the mode. Thus if the seected mode incudes k non-zero coefficients, I define the degrees of freedom (d.f.) as k An exampe of LASSO fitting As an exampe, I use one subnetwork from the L = 100 case, denoted as Y 1,to iustrate the LASSO fitting. Note that the same Y 1 variabe is aso used as iustration exampe for the NNLS and our newy deveoped method. β β β β β β β β β β β β β β β β β Figure 2.1: Profies of LASSO coefficients. A vertica ine is drawn at df=11, the vaue chosen by smaest Cp criterion. The profies are piece-wise inear, and so are computed ony at the first 15 steps of LASSO path. β j is the coefficients of the predictor X j. With the increase of df, the predictor is added into the mode one by one. In this particuar LASSO regression, there are 153 steps in the whoe LASSO path

24 which means that 153 variabes are added into the mode at the end of LASSO. Df is used to denote the number of variabes that are aready seected into the mode at each step. df index Cp RSS λ Tabe 2.1: The first 30 steps of the LASSO path on Y 1. Df denotes degrees of freedom of the mode. Index indicates which predictor is incuded in the mode at each step. 16 Tabe 2.1 shows the first 30 steps of the LASSO path. From Tabe 2.1, the smaest Cp vaue corresponds to df=11, which means 11 variabes are seected by this mode. The LASSO path is dispayed in Figure 2.1, where a vertica ine is drawn at df=11

25 17 whichisthebest modeseectedby Maow scpcriterion. Thecoefficientsoftheseected modearepotedinfigure2.2. Fromthepot, X 19 rankshighestin matchingy andisfoowedbyx 107 andx 51. Meanwhie,X 2, X 149 andx 124 payatriviaroein matchingy sincethecorespondingcoefficients arereativeysma. TheCoeficientsofthe Modeseected β j^ Indexofβ j^ Figure2.2: ProfiesofLASSOcoefficientsoftheseectedbest modeony 1. Among the200estimatedcoefficients,11variabesareseectedinto mode,awithpositive coefficients. Theindexnumbersofthepredictorsare markednexttoeachpoint.

26 NNLS Regression Review of the method Given X 1, X 2,..., X L in X N L matrix as the predictor variabes and a coumn vector Y as the response variabe, the criterion of NNLS regression is [18]: argmin β Y Xβ 2 subject to β j 0, j = 1,..., L. The NNLS optimization probem is a quadratic programming probem [18], argmin Y Xβ 2 = argmin(β T X T Xβ Y T Xβ β T X T Y + Y T Y ) β 0 β 0 = argmin(β T X T Xβ 2Y T Xβ + Y T Y ) β 0 = argmin( 1 β 0 2 βt X T Xβ Y T Xβ) There are many simiar agorithms to sove the NNLS optimization probem, of which the first widey used agorithm is an active set method pubished by Lawson and Hanson in their 1974 book [18]. The R package nns is used to sove the NNLS fitting in this study. The NNLS regression coud resut in many positive coefficients in the mode, which is not convenient for seecting the most important predictors. In our exampe most of the estimated coefficients are extremey sma and can be negected without much increase in RSS. Therefore, threshoding these sma positive coefficients has itte infuence on the precision of the matching. Here, hard-threshoding is used in mode seection of NNLS. The hard-threshoding works by setting the threshod imit t equa to a sequence of increasing numbers. To cacuate the corresponding RSS, a the coefficients that are ess than or equa to the threshod t are set to zero. Hence I obtain a sequence of RSS corresponding to the sequence of threshods. Our threshoding criterion is to choose the maximum threshod such that the difference in RSS before and after threshoding is ess than 10 5.

27 Anexampeof NNLSfitting Asanexampe,IagainuseY 1 toiustratethennlsfiting. Amongthe200predictorvariabes NNLSanaysisresutedin71positivecoefficientswiththerestofthe coefficientsazerocoefficients. Figure2.3showsthechangeinthevauesofRSSversusthreshod. Thethreshod tissetfrom0to0.05 withanincrementof Theverticaineisthe mode indicatingthatthethreshodis Thecoefficientssmaerthan0.009arethreshodedtozero. Beforeappyingforanythreshoding,theRSSis ,andafter threshodingat0.009the RSSis with11nonzerocoefficientseft. The differencebeforeandafterthreshodingisno morethan10 5. Givensuchavery smadifferenceinrssitisreasonabetoignoretheosofprecision. ThechangeofRSSfordiferentthreshoding modes RSS threshod Figure2.3: ChangeofRSSwithincrementofthreshodinNNLSregresionofY 1. ThevauesofRSSarepotedversesthevauesofthethreshods. Theverticaineis the modechosenwhenthreshodis0.009.

28 20 coeficients TheCoeficientsofthe Modeseeted Indexofβ j^ Figure2.4: ThecoefficientsoftheNNLS modeofy 1 afterthreshoding. Amongthe 200estimatedcoefficients,11variabesremaininthe mode. Theindexnumbersof nonzeroβ j are markednexttothepoints. Thebest modeafterthreshodingispotedinfigure2.4. Fromthepot,X 19 rankshighestin matchingy andisfoowedbyx 107 andx 51. Meanwhie,X 2,X 149 andx 124 payatriviaroeinmatchingysincethecorespondingcoefficientsarevery sma. TheresutsareverysimiartotheLASSOprofie(bycomparingFigure2.4 andfigure2.2). 2.3 A methodbasedon minimizingjsd Differentfrom LASSOand NNLSregresion,ourproposed methodusesjensen- Shannon Divergence(JSD)insteadof RSSastheoptimizationcriterion. JSDisa wideyusedmethodofmeasuringthesimiaritybetweentwoprobabiitydistributions, anditiswideyappiedinbioinformaticsandgenomecomparisons[9]. TheJensen-ShannonDivergencebetweentwomutinomiaprobabiityvectorsx= (x 1,,x n )andy=(y 1,,y n )isdefinedas[9]:

29 21 JSD(x y) = 1 2 ( n i=1 x i og x i h i + n i=1 y i og y i h i ) (2.1) where h = x+y 2 is the mean vector of x and y. From the definition, it is cear that JSD is a modified version of Kuback-Leiber divergence. The direct appication of Kuback-Leiber divergence is not possibe to measure two mutinomia distributions with many zero probabiity categories. Often the ogarithm in the JSD definition uses 2 as its base, in which case JSD has the property: 0 JSD 1. Thus when the ogarithm uses e as its base, it has the property: 0 JSD n(2). In this thesis, e is used as the ogarithm base. Jensen-Shannon Distance is defined as the square root of Jensen-Shannon Divergence [7]. As a distance measure, Jensen- Shannon Distance satisfies a three properties required of a distance [9] The method Given a set of mutinomia probabiity vectors X 1, X 2,..., X p in the matrix X and a target mutinomia probabiity vector Y seected from the matrix Y, the aim is to find a inear combination of severa X variabes which has smaest JSD with the variabe Y. i.e. Ŷ = β 1 X 1 + β 2 X β L X L L with the coefficients satisfying β j = 1, β j [0, 1]. This guarantees that Ŷ is a j=1 mutinomia probabiity vector too. The objective function for finding the β is min β JSD(Y Ŷ ) Directy optimizing this function is chaenging. At first severa optimization agorithms such as gradient descent and Newton s methods with constraints [13] were tried to sove the probem directy. But the Jacobian or Hessian is unavaiabe anayticay and too expensive to compute numericay at every iteration. Then I considered an aternative method to Newton s methods (i.e., Quasi-Newton methods) without

30 requiring the Jacobian in order to search for zeros [13] [3]. However, Quasi-Newton methods often find a oca maximum and minimum of a function and the soution is highy dependent on the initia vaues [3]. Since different sets of initia vaues can resut in different oca maximum and minimum, the need to set the 200 initia vaues became a big probem. The output of 200 coefficients are not sparse at a if the input numbers are not sparse. The chaenge posed by directy optimizing this function eads us to consider other methods. A heuristic searching strategy which is simiar in nature to the forward variabe seection procedure is proposed. First, I cacuate the JSD between Y and each X j variabe, JSD(Y X 1 ), JSD(Y X 2 ),..., JSD(Y X L ), which are then ordered in an increasing order. Thus, the X variabes are ordered as X [1],..., X [L], so that the variabes in X which are more simiar to Y wi be considered first. I first assign the best match of Y as X [1] and cacuate the JSD, then I consider whether X [2] shoud be added into the mode. If the inear combination of X [1] and X [2] ead to a smaer JSD with Y variabe, then X [2] is added into the mode, otherwise the procedure is stopped. Adding X variabes according to this procedure ensures that the inear combination satisfy the constraints on the coefficients whie the JSD to Y is decreased at each step. The optimization criterion at first step is foowing: min β JSD(Y (β 1 X [1] + β 2 X [2] )) subject to β 1 + β 2 = 1, β 1 [0, 1], β 2 [0, 1] The R package named Genera-purpose Optimization is used to sove the optimization probem with constraint. The procedure stops when adding another variabe wi no onger reduce the JSD (i.e., the coefficient for the newy added variabe is zero). The detais of this forward variabe seection procedure are summarized in the foowing agorithm: Note that MJSD might not find the optimum resut for a the coefficients. This coud happen when the seected subnetworks overap each other. 22 However, when one subnetwork is spit into amost non-overapping severa smaer subnetworks (as we expect wi be a common case), the MJSD method wi provide neary optimum resuts. In the next section I give an exampe showing that the seected X variabes are non-overapping subnetworks.

31 23 Agorithm 1: A forward seection procedure to sequentiay minimize JSD 1. Initiaization (a) Cacuate the JSD between Y to each X variabe, order X variabes according to their JSD from smaest to argest. (b) Set an set P =, set R = [X [1],..., X [p] ]. Move X [1] from set R to set P, set Ŷ = X [1], β = At the ith step (i=2,... p): (a) Move X [i] variabe from set R to set P: cacuate β that minimize JSD(Y β Ŷ + (1 β )X [i] ). (b) If 1 β > 0, update Ŷ = β Ŷ + (1 β )X [i]. Go back to step 2(a). Ese if 1 β = 0, the recursive procedure stops, output Ŷ An exampe of JSD minimization method I again use Y 1 as an exampe to iustrate the method s abiity in finding the best matching set of variabes in X. The JSD of each of the 200 X variabes with Y 1 are shown in Figure 2.5. Here the smaest one is JSD(Y 1 X 19 ), the second smaest one is JSD(Y 1 X 73 ). In Figure 2.5, most of the 200 X variabes have JSD near the upper bound of n(2). The X variabe with smaest JSD vaues wi be seected by the mode at first. When β converges to 1, it shows that the program adds a the 11 smaest X j variabes to the inear combination resuting in the fina optima JSD as Tabe 2.2 shows the updated parameters of β and JSD at each step. X j is the variabe added into the mode in each step. At the beginning the initia β is equa to 1, and then it becomes a vaue between 0 and 1 with more variabes seected into the mode. The β stops at as shown in Tabe 2.2. β is 1 at the 12 step.

32 24 X j β JSD(Y Ŷ) 1 X X X X X X X X X X X Tabe2.2: Theupdatedparametersβ for MJSD methodony 1 JSD JSDofY 1 toax j indexofx j Figure2.5: ProfieofJSDofY 1 toa200x variabes.jsdvauesarepotedverses theindexof200x variabes. TheindexnumberofsomeX variabesareaso marked nexttothepoints. ˆβ j arepotedinfigure2.6. ˆβ j standsforthecoefficientofx j thathasbeen seectedintothe modeafterthewhoeprocedurestops. ˆβ j =1. X 19 stiranks

33 25 highestin matchingy,buttheweightcacuatedbythis methodisevenargerthan LASSOand NNLS methods. Thesamesetofvariabesareseectedbyathree methods,whichincude X 19,X 51,X 73,X 107,X 1,X 72,X 2,X 124,X 149,X 124,X 164,howevertheorderthatthesevariabesenterthe mode,andthecoefficients,aredifferent. CoeficientsfromMJSDmethodonY 1 Coeficients indexof200x j variabe Figure2.6: Coefficientsofthevariabesresutedfromthe MJSD method. Amongthe 200estimatedcoefficients,11X variabesareseectedinto modewithcoresponding positivecoefficients. Theindexnumbersofthepredictorsare markednexttothe points. Next,Iinvestigatedtheoverapinthereactioncompositionofthe11subnetworks withpositivecoefficientsthatwereaddedintothe modeby MJSD.Figure2.7shows athenonzeroreactionsinthe11seectedxvariabes. Forthispot,reactionswere reorderedasfoows. First,thenonzeroreactionsofX 19 wereorderedfromargest tosmaest. Then,anyremainingnonzeroreactionsofX 51 wereorderedfromargest tosmaestandaddedtothesequencederivedfromx 19. Thisproceswasrepeated untiathenonzeroreactionsofathe11x variabes werereordered. ThedistributionofreactionsinFigure2.7showsthattheseectedsubnetworksareargey

34 26 non-overappingatthereactioneve(i.e.,theyarecomprisedofdifferentreactions). Thusthethe MJSD methodshoudprovideverysimiarresutstothosethatwoud beobtainedbydirectyoptimizingtheinearcombinationofthose11x variabes. Thepositivereactionsofathe11Xvariabes positivereactions X 19 X 51 X 73 X 107 X 1 X 72 X 2 X 124 X 151 X 149 X indexoftheorderedpositivereactions Figure2.7: Thepositivereactionsofathe11X variabes. Theegendindicatesthe indexofthe11x variabesandtheircorespondingpointtypes. 2.4 Summary LASSOregresionresutssatisfytherequirementofsparsityofˆβ,throughaproper modeseectioncriteriontoseectthebest modefromafulassopath.inour exampe,thebest modeareadysatisfiesthenon-negativityrequirementofthecoefficients. Thenon-negativitywasnaturaysatisfiedin93%ofathe100subnetwork

35 27 matchings when using Cp-statistics for mode seection. For the remaining 7% of subnetworks, some very sma negative coefficients resut from LASSO fitting among their matching X variabes. By changing these sma negative coefficients to zero, the RSS is not much changed. NNLS regression resuts naturay satisfy the non-negativity constraints of the coefficients. The NNLS soution is not ony non-negative, but aso sparse to some extent. However, there are sti too many very sma positive coefficients in the NNLS regression resuts. Hence, further sparsity has to be achieved through an appropriate threshoding agorithm without much increase in the fina RSS. The new method, named as MJSD, uses a forward variabe seection procedure to simpify the optimization probem. It is based on directy minimizing JSD to obtain the positive and sparse coefficients that sum to 1, without the mode seection step in LASSO or the hard-threshoding in NNLS. Note that since LASSO and NNLS don t necessariy output a soution satisfying ˆβ1=1, the resuting Ŷ from LASSO and NNLS are not probabiity vectors. In order to compare the resuts, I isted a the fina coefficients from these three methods in Tabe 2.3. The three methods have seected the same subset of variabes in X to match Y 1, but the coefficients are different. The LASSO and NNLS resuts are more simiar in this case. To compare the fitting of these three methods, both RSS and JSD on Y 1 are used as the criteria. RSS for the three methods can be directy N cacuated from RSS = (y i ŷ i ) 2. However, before cacuating the JSD for LASSO i=1 and NNLS matches, I need to normaize the vector Ŷ to sum to 1. The normaized N vector is denoted as Ŷ = [ŷ 1,..., ŷ N ]T, where ŷ i=ŷ i / ŷ i. Then the JSD(Y Ŷ ) is used as criterion for LASSO and NNLS. JSD and RSS for the three methods on Y 1 are aso shown in Tabe 2.3. When using JSD as the criterion, the matching of MJSD is better than LASSO and NNLS, and NNLS is a itte better than LASSO. However, when using RSS as criterion, the matching of LASSO and NNLS are better than MJSD, and NNLS is sti a itte better than LASSO. This outcome is reasonabe because LASSO and NNLS use RSS as optimization criterion, whie MJSD use JSD as optimization criterion. i=1

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Chemical Kinetics Part 2

Chemical Kinetics Part 2 Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Chemical Kinetics Part 2. Chapter 16

Chemical Kinetics Part 2. Chapter 16 Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Formulas for Angular-Momentum Barrier Factors Version II

Formulas for Angular-Momentum Barrier Factors Version II BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A

More information

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method High Spectra Resoution Infrared Radiance Modeing Using Optima Spectra Samping (OSS) Method J.-L. Moncet and G. Uymin Background Optima Spectra Samping (OSS) method is a fast and accurate monochromatic

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/Q10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of using one or two eyes on the perception

More information

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Combining reaction kinetics to the multi-phase Gibbs energy calculation 7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

Unconditional security of differential phase shift quantum key distribution

Unconditional security of differential phase shift quantum key distribution Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

https://doi.org/ /epjconf/

https://doi.org/ /epjconf/ HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

<C 2 2. λ 2 l. λ 1 l 1 < C 1

<C 2 2. λ 2 l. λ 1 l 1 < C 1 Teecommunication Network Contro and Management (EE E694) Prof. A. A. Lazar Notes for the ecture of 7/Feb/95 by Huayan Wang (this document was ast LaT E X-ed on May 9,995) Queueing Primer for Muticass Optima

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled.

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled. imuation of the acoustic fied produced by cavities using the Boundary Eement Rayeigh Integra Method () and its appication to a horn oudspeaer. tephen Kirup East Lancashire Institute, Due treet, Bacburn,

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/P10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of temperature on the rate of photosynthesis

More information

Fast Blind Recognition of Channel Codes

Fast Blind Recognition of Channel Codes Fast Bind Recognition of Channe Codes Reza Moosavi and Erik G. Larsson Linköping University Post Print N.B.: When citing this work, cite the origina artice. 213 IEEE. Persona use of this materia is permitted.

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

The Binary Space Partitioning-Tree Process Supplementary Material

The Binary Space Partitioning-Tree Process Supplementary Material The inary Space Partitioning-Tree Process Suppementary Materia Xuhui Fan in Li Scott. Sisson Schoo of omputer Science Fudan University ibin@fudan.edu.cn Schoo of Mathematics and Statistics University of

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

A Robust Voice Activity Detection based on Noise Eigenspace Projection

A Robust Voice Activity Detection based on Noise Eigenspace Projection A Robust Voice Activity Detection based on Noise Eigenspace Projection Dongwen Ying 1, Yu Shi 2, Frank Soong 2, Jianwu Dang 1, and Xugang Lu 1 1 Japan Advanced Institute of Science and Technoogy, Nomi

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

BDD-Based Analysis of Gapped q-gram Filters

BDD-Based Analysis of Gapped q-gram Filters BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de

More information

Lobontiu: System Dynamics for Engineering Students Website Chapter 3 1. z b z

Lobontiu: System Dynamics for Engineering Students Website Chapter 3 1. z b z Chapter W3 Mechanica Systems II Introduction This companion website chapter anayzes the foowing topics in connection to the printed-book Chapter 3: Lumped-parameter inertia fractions of basic compiant

More information

A simple reliability block diagram method for safety integrity verification

A simple reliability block diagram method for safety integrity verification Reiabiity Engineering and System Safety 92 (2007) 1267 1273 www.esevier.com/ocate/ress A simpe reiabiity bock diagram method for safety integrity verification Haitao Guo, Xianhui Yang epartment of Automation,

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.)

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.) October 9, 00 IE 6 Exam Prof. Vardeman. The viscosity of paint is measured with a "viscometer" in units of "Krebs." First, a standard iquid of "known" viscosity *# Krebs is tested with a company viscometer

More information

Chapter 7 PRODUCTION FUNCTIONS. Copyright 2005 by South-Western, a division of Thomson Learning. All rights reserved.

Chapter 7 PRODUCTION FUNCTIONS. Copyright 2005 by South-Western, a division of Thomson Learning. All rights reserved. Chapter 7 PRODUCTION FUNCTIONS Copyright 2005 by South-Western, a division of Thomson Learning. A rights reserved. 1 Production Function The firm s production function for a particuar good (q) shows the

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008 Random Booean Networks Barbara Drosse Institute of Condensed Matter Physics, Darmstadt University of Technoogy, Hochschustraße 6, 64289 Darmstadt, Germany (Dated: June 27) arxiv:76.335v2 [cond-mat.stat-mech]

More information

Wavelet Methods for Time Series Analysis. Wavelet-Based Signal Estimation: I. Part VIII: Wavelet-Based Signal Extraction and Denoising

Wavelet Methods for Time Series Analysis. Wavelet-Based Signal Estimation: I. Part VIII: Wavelet-Based Signal Extraction and Denoising Waveet Methods for Time Series Anaysis Part VIII: Waveet-Based Signa Extraction and Denoising overview of key ideas behind waveet-based approach description of four basic modes for signa estimation discussion

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems Bayesian Unscented Kaman Fiter for State Estimation of Noninear and Non-aussian Systems Zhong Liu, Shing-Chow Chan, Ho-Chun Wu and iafei Wu Department of Eectrica and Eectronic Engineering, he University

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

HYDROGEN ATOM SELECTION RULES TRANSITION RATES

HYDROGEN ATOM SELECTION RULES TRANSITION RATES DOING PHYSICS WITH MATLAB QUANTUM PHYSICS Ian Cooper Schoo of Physics, University of Sydney ian.cooper@sydney.edu.au HYDROGEN ATOM SELECTION RULES TRANSITION RATES DOWNLOAD DIRECTORY FOR MATLAB SCRIPTS

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness 1 Scheduabiity Anaysis of Deferrabe Scheduing Agorithms for Maintaining Rea-Time Data Freshness Song Han, Deji Chen, Ming Xiong, Kam-yiu Lam, Aoysius K. Mok, Krithi Ramamritham UT Austin, Emerson Process

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems Source and Reay Matrices Optimization for Mutiuser Muti-Hop MIMO Reay Systems Yue Rong Department of Eectrica and Computer Engineering, Curtin University, Bentey, WA 6102, Austraia Abstract In this paper,

More information

Throughput Optimal Scheduling for Wireless Downlinks with Reconfiguration Delay

Throughput Optimal Scheduling for Wireless Downlinks with Reconfiguration Delay Throughput Optima Scheduing for Wireess Downinks with Reconfiguration Deay Vineeth Baa Sukumaran vineethbs@gmai.com Department of Avionics Indian Institute of Space Science and Technoogy. Abstract We consider

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

V.B The Cluster Expansion

V.B The Cluster Expansion V.B The Custer Expansion For short range interactions, speciay with a hard core, it is much better to repace the expansion parameter V( q ) by f( q ) = exp ( βv( q )), which is obtained by summing over

More information

A Separability Index for Distance-based Clustering and Classification Algorithms

A Separability Index for Distance-based Clustering and Classification Algorithms IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 1 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna

More information

UI FORMULATION FOR CABLE STATE OF EXISTING CABLE-STAYED BRIDGE

UI FORMULATION FOR CABLE STATE OF EXISTING CABLE-STAYED BRIDGE UI FORMULATION FOR CABLE STATE OF EXISTING CABLE-STAYED BRIDGE Juan Huang, Ronghui Wang and Tao Tang Coege of Traffic and Communications, South China University of Technoogy, Guangzhou, Guangdong 51641,

More information

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR Pierric Bruneau, Marc Gegon and Fabien

More information

Learning Fully Observed Undirected Graphical Models

Learning Fully Observed Undirected Graphical Models Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,

More information

Random maps and attractors in random Boolean networks

Random maps and attractors in random Boolean networks LU TP 04-43 Rom maps attractors in rom Booean networks Björn Samuesson Car Troein Compex Systems Division, Department of Theoretica Physics Lund University, Sövegatan 4A, S-3 6 Lund, Sweden Dated: 005-05-07)

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Emmanuel Abbe Colin Sandon

Emmanuel Abbe Colin Sandon Detection in the stochastic bock mode with mutipe custers: proof of the achievabiity conjectures, acycic BP, and the information-computation gap Emmanue Abbe Coin Sandon Abstract In a paper that initiated

More information

Mechanism Deduction from Noisy Chemical Reaction Networks

Mechanism Deduction from Noisy Chemical Reaction Networks Mechanism Deduction from Noisy Chemica Reaction Networks arxiv:1803.09346v1 [physics.chem-ph] 25 Mar 2018 Jonny Proppe and Markus Reiher March 28, 2018 ETH Zurich, Laboratory of Physica Chemistry, Vadimir-Preog-Weg

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Lecture 6: Moderately Large Deflection Theory of Beams

Lecture 6: Moderately Large Deflection Theory of Beams Structura Mechanics 2.8 Lecture 6 Semester Yr Lecture 6: Moderatey Large Defection Theory of Beams 6.1 Genera Formuation Compare to the cassica theory of beams with infinitesima deformation, the moderatey

More information

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University

More information

High-order approximations to the Mie series for electromagnetic scattering in three dimensions

High-order approximations to the Mie series for electromagnetic scattering in three dimensions Proceedings of the 9th WSEAS Internationa Conference on Appied Mathematics Istanbu Turkey May 27-29 2006 (pp199-204) High-order approximations to the Mie series for eectromagnetic scattering in three dimensions

More information