Catalogue with Probabilistic Topic Models
|
|
- Sheryl Long
- 5 years ago
- Views:
Transcription
1 Inferring Functional Groups from Microbial Gene Catalogue with Probabilistic Topic Models Xin Chen 1, TingTing He 2, Xiaohua Hu 1, Yuan An 1, Xindong Wu 3 1 College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, USA 2 Dept. of Computer Science at Central China Normal University, Wuhan, China 3 Department of Computer Science, University of Vermont, Burlington, VT, USA 1
2 Backgrounds: Genomics Genomics refers to the analysis of genomes. A genome can be thought of as the complete set of DNA sequences that codes for the hereditary material that is passed on from generation to generation. These DNA sequences include all of the genes (the functional and physical unit of heredity passed from parent to offspring) and transcripts (the RNA copies that are the initial step in decoding the genetic information) included within the genome. Thus, genomics refers to the sequencing and analysis of all of these genomic entities, including genes and transcripts, in an organism. 2
3 Backgrounds: GenBank and NCBI In recent years we see growth of GenBank and NCBI with the advancement of gene sequencing technology. 3
4 Backgrounds: annotating algorithms As the growth of GenBank and NCBI, a lot of annotating algorithms are developed to match genomic sequences to GenBank /NCBI standard reference and attach meta-information to the sequences. 4
5 Backgrounds: meta-information The annotated meta-information involves hierarchical data such as NCBI Taxonomy and Gene Ontology. 5
6 Challenges: Metagenomics With the fast advancing sequencing techniques, large amounts of sequenced genomes and meta-genomes from uncultured microbial samples (microbe) have become available. The goal of metagenomics is to study the genome-wide gene-expression data from uncultured environment samples (like the ocean, soil and human body) and understand the underlying biological processes. 6
7 Research Questions What s the major research questions of our study? We use our data mining framework to investigate following questions: 1) Given a large number of genome fragments from an microbial samples, what genomes are there? Answering this question requires mapping the meta-genomic reads to taxonomic units (usually a homology-based sequence alignment, and this task is also known as taxonomic classification or taxonomic analysis). 2) What are the major functions of these genomes? The answers to this question involve annotating the major functional units (such as signal transduction, metabolic capacity and gene regulatory) on the genome-level (a.k.a. functional analysis). Our research objective: We aim to develop a new method that is able to analyze the genome-level composition of DNA sequences, in order to characterize a set of common genomic features shared by the same species, tell their functional roles. 7
8 Related topics in this presentation: Structural annotation and protein encoding regions Homology-based functional analysis Topic Models 8
9 Structural annotation and protein encoding regions Structural annotation Annotating the regions of known open reading frames (ORF s), non-coding genes (rrna, trna, mirna), Promoters and UTR s in the DNA sequences 9
10 Structure annotation and protein encoding regions (continue) NCBI standard d reference sequences have detailed d structural annotations of both non-protein encoding regions (such as trna) and protein encoding regions (CDS) as well as the corresponding gene names (if applicable). The GenBank accession number of each reference sequence is available on each NCBI online query. 10
11 Related topics in this presentation: Structural annotation and protein encoding regions Homology-based functional analysis Topic Models 11
12 Functional analysis - overview Functional analysis Uncover the major gene functions related to the genomic sequences Requires explaining the biochemical activity (a.k.a. molecular function) of gene product, identifying the biology process to which the gene or gene product contribute (including information about enzyme, pathway and metabolic capabilities related to the gene). 12
13 Homology-based functional analysis(richter and Huson, 2009) Homology-based approach has been recently introduced d to achieve functional annotation for metagenomic reads (Richter and Huson, 2009). The framework begins with a homology based BLASTX algorithm to match the metagenomic fragments against the reference sequences in NCBI database. The BLASTX hits will associate fragments with related protein ID and gene names. After that, with the help of the Gene Ontology (GO) database to refer associated gene names to corresponding GO terms, thus provides an overview of gene function and products for metagenomic fragments. 13
14 Homology-based functional analysis(richter and Huson, 2009) GO terms obtained from database identifier e mapping (Richter and Huson, 2009) 14
15 Limitations with Homology-based Functional Analysis Methods 1. Homology-based approaches very much reply on the result of local l sequence alignment (such as BLAST and BLASTX) to the known open reading frames (ORF). The BLAST-like local alignment may either return hundreds of hits, or return no hits, depending on the threshold of E-value used. In the latter case, the current methods are unable to provide any functional annotation. In the former case, it usually lacks of a proper tie-breaker to further reduce the hits, which h makes the functional annotation some how ambiguous (with hundreds of probable explanation) 2. The homology-based functional annotation methods did not provide any insight about the major functional capabilities of genomes (like which gene functions are more commonly shared by strains from the same species), as there is no priority it for the annotated t GO terms. 15
16 Related topics in this presentation: Structural annotation and protein encoding regions Homology-based functional analysis Topic Models 16
17 Topic Modeling - Intuitive Intuitive Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted sensory, point brain, by point to visual centers in the brain; the cerebral cortex was a visual, perception, movie screen, so to speak, upon which the image in retinal, the eye was cerebral projected. Through cortex, the discoveries of eye, Hubel cell, and Wiesel optical we now know that behind the origin of the visual perception in the nerve, brain there image is a considerably more complicated Hubel, course of Wiesel events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a stepwise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. Assume the data we see is generated by some parameterized random process. g y p j g Learn the parameters that best explain the p y data. Use the model to predict (infer) new g data, based on data seen so far. 17
18 Notations Word Basic unit. Item from a vocabulary indexed by {1,...,V}. Document Sequence of N words, denoted by w = (w1,w2, w2...,wn). Collection A total t of D documents, denoted d by C = {w1,w2,...,wd}. Topic Denoted by z, the total number is K. Each topic has its unique word distribution p(w z) 18
19 Background & Existing Techniques of Generative Latent Topic Models The Naïve Bayesian model Likelihood of word w given topic z * z = p z w p z p w z arg max ( ) ( ) ( ) Word-Topic decision Prior Probability of Topic z The probabilistic latent semantic indexing (PLSI) model Assumption: Each document has a mixture of k topics. Fitting the model involves: PLSI Model (Hoffman, 2001) Estimating the topic specific word distributions p(w i i z k ) and document specific topic distributions p(z k d j ) from the corpse via maximum likelihood estimation (MLE). 19
20 Latent Dirichlet Allocation (LDA) Model (Blei, 2003) φ ~ Dir( β ) j θ d ~Dir(α) j d pz ( d)~ Multiθ ( ) j j p( w z )~ Multi( φ ) wi d i, j i, j wi = j wi w-i z-wi. d Wβ + n i, j Tα + n i,. pz (,, ) i β + n α + n In PLSI model, the topic mixture probability p(z k d j )for documents are fixed once the model is estimated. For new coming document, the model needed to be re-estimated. Thus it is not scalable. The LDA model treats the probability of latent topics for each document p(z d) and the conditional probability of words for each latent topic p(w z) as latent random variables which are subject to change when new document comes. 20
21 LDA Model Estimation - Gibbs Sampling Monte Carlo process (Griffiths, 2004) Probability of a topic being assigned to a word given other observations: pz ( = j w, w, z ) pw ( z = j, w, z ) pz ( = j w, z ) wi i -i -wi i wi -i -wi -i -wi j j j pw ( z = j, w, z ) = pw ( z= j, ϕ, w, z ) p( ϕ w, z ) dϕ = i wi -i -wi i -i -wi -i -wi α + n d d d pz ( = j w-i, z-wi ) = pz ( = j θ ) p( θ w-i, z-wi ) dθ = Tα + n d i, j d i,. β + n W β + wi i, j. n i, j j j pw ( z=, jϕ, w, z ) = ϕ i -i -wi p ( ϕ j w, z ) p ( w, z ϕ j ) p ( ϕ j ) in which -i -wi -i -wi j j p( w, z ϕ )~ Multi( ϕ ) -i -wi p( θ d w, z ) p( w, z θ d ) p( θ d ) Since and j and p( ϕ )~ Dir( β). It follows that We have j p( ϕ w-i, z-wi )~ Dir( β + n ) wi i, j -i -wi -i -wi d d p( w, z θ )~ Multi( θ ) -i -wi d p( θ )~ Dir( α) d p( θ w-i, z-wi )~ Dir( α + n ) 21 d i, j
22 Mote-Carlo process Given the word-topic posterior probability, the Monte Carlo process becomes really straightforward, which is similar to throwing dice (given the probability of each facet to appear) to determine the assignment of topics to each words for the next round. Given probability for each word: pz ( = j w, w, z ), j = 1... K wi i -i -wi New topic assignment for each word. 22
23 Statistical relationships of words and topics 23
24 An example of topic assignment to words 24
25 Experiments 25
26 Experiment: Inferring Functional Groups from Microbial Gene Catalogue with Topic Models In our experiment, based on the functional elements derived from non-redundant CDs catalogue, we show that the configuration of functional groups in meta-genome samples can be inferred by probabilistic topic modeling. The probabilistic topic modeling is a Bayesian method that is able to extract useful topical information from unlabeled data. When used to study microbial samples the functional elements (including taxonomic levels, and indicators of gene orthologous groups and KEGG pathway mappings) bear an analogy with words. Estimating the probabilistic topic model can uncover the configuration of functional groups (the latent topic) in each sample. Which may be further used to study the genotype-phenotype p connection of human disease. 26
27 Experimental Data Collection In our experiment, we conduct a probabilistic topic modeling experiment to identify functional groups from human gut microbial community data is generated by [Qin, et al. 2010], which is openly accessible via The human gut microbial samples from [Qin, et al. 2010] belong to both healthy subjects (HS) and patients with inflammatory bowel disease (IBD). Specifically, the IBD patients are from two different groups, one group with Crohn s disease (CD), and the other group with ulcerative colitis (UC). In total, there are 85 healthy samples, 15 UC samples and 12 CD samples. 27
28 Experimental Data Collection (continue) According to [Qin, et al. 2010], the Illumina GA reads from human gut microbial samples are firstly assembled into longer contigs. After that, the Glimmer program was used to predict protein-encoding sequences (CDs) from assembled contigs. The predicted CDs sequences were then aligned to each other and form a non-redundant CDs catalog (a.k.a. minimal gut genome). The non-redundant CDs catalog consists of 3,299,822 non-redundant CDs sequences with an average length of 704 bp. CDs_id: MH0001 Name: GL _ MH0001 _[Lack_ 3'-end] ]_[mrna]_ locus=scaffold96_ 9:1:1206:- Length: 1206 COG/KO: COG4799 K01966 Pathway maping: map00280,map00640 Taxonomic level: species - Eubacterium eligens 28
29 Experimental Data Collection (continue) In our experiment, three types of functional elements are derived from the non-redundant CDs catalog, i.e. the NCBI taxonomic level indicators, indicator of gene orthologous groups and KEGG pathway indicators. Given a non-redundant CDs sequence, its NCBI taxonomical level is obtained by carrying out BLASTP alignment against the NCBI NR database. The taxonomical level of each non-redundant CDs sequence is determined by the lowest common ancestor (LCA) based algorithm. The taxonomic abundance data for each sample can be computed by counting the indicators of NCBI taxonomical levels. l The assignments of gene orthologous indicator and KEGG pathway indicator are achieved by BLASTP alignment of the amino-acid sequence from predicted CDs to the eggnog database and KEGG database. 29
30 Experimental Data Collection (continue) NCBI Taxonomic Levels Orthologous Group Indicators Genus Genus Phylum Class Genus Clostridium Bacteroides Firmicutes Clostridia Bacillus COG0463 : Glycosyltransferases involved in cell wall biogenesis COG0642 : Signal transduction histidine kinase COG1132 : "ABC-type multidrug transport system, ATPase and permease components" COG0438 : Glycosyltransferase KEGG Pathway Indicators map00230 : Metabolism_Nucleotide Metabolism_Purine metabolism map00240 : Metabolism_Nucleotide Metabolism_Pyrimidine metabolism map00350 : Metabolism_Amino Acid Metabolism_Tyrosine metabolism The union of unique functional elements jointly defines a fixed word vocabulary. In total, t there are 647,136 NCBI taxonomic level l indicators, with a vocabulary size of 748; there are a total of 1,293,764 gene orthologous group indicators, with a vocabulary size of 4667; and there are 953,493 KEGG pathway indicators, with a vocabulary size of
31 Groups of functional elements in microbial community Given non-redundant CDs catalog, and derived functional elements, we are interested in identifying the frequent co-occurrence occurrence patterns of functional elements (a.k.a. functional groups). 31
32 Generative process of proposed p model Commonly shared functional elements across samples may suggest functional similarity and biological relevance among samples. To cover such information, a genome-wide background distribution of functional elements need to be estimated, which leads to the introduction of the background topic z 0 in topic modeling. 32
33 Illustration of the background topic of gene OGs indicators Background Topic - Indicator of Gene OGs Gene OGs Indicator Descriptions Probability COG0463 Glycosyltransferases involved in cell wall biogenesis COG0642 Signal transduction histidine kinase COG0582 Integrase COG1132 ABC-type multidrug transport system, ATPase and permease components" COG0438 Glycosyltransf erase COG0745 Response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain COG1396 Predicted transcriptional regulators COG0577 ABC-type antimicrobial peptide transport system, permease component COG2207 AraC-type DNA-binding domaincontaining proteins COG3250 Beta-galactosidase/beta-glucuronidase eaga ac a gucuo
34 Illustration of the background topic of KEGG Pathway Indicators Background Topic - KEGG Pathway Indicator Pathway Map ID Descriptions Probability map00230 map00051 map00500 map00240 map00350 map00260 map00010 map00620 map00251 map00550 Metabolism_Nucleotide Metabolism_Purine metabolism Metabolism_Carbohydrate Metabolism_Fructose and mannose metabolism Metabolism_Carbohydrate Metabolism_Starch and sucrose metabolism Metabolism_Nucleotide Metabolism_Pyrimidine metabolism Metabolism_Amino Acid Metabolism_Tyrosine metabolism Metabolism_Amino i Acid Metabolism_"Glycine, i serine and threonine metabolism" Metabolism_Carbohydrate Metabolism_Glycolysis / Gluconeogenesis Metabolism_Carbohydrate Metabolism_Pyruvate metabolism Metabolism_Amino Acid Metabolism_Glutamate metabolism Metabolism_Glycan Biosynthesis and Metabolism_Peptidoglycan biosynthesis
35 Uncovered latent topics with respect to NCBI taxonomic indicators Illustration of the most relevant latent topics with respect to different taxa Topic ID MI Score Topic ID MI Score Topic ID MI Score family_enter obacteriaceae Topic Topic Topic genus_clostri dium Topic Topic Topic genus_bacter oides Topic Topic Topic phylum_bact eroidetes Topic Topic Topic phylum_firm icutes Topic Topic Topic Discoveries: For each taxon, latent topics are sorted with respect to the mutual information score (MI score). The MI severs as a relevance measurement between taxa and latent topics. It shows that phylum Firmicutes is most relevant to the background topic (Topic 0). Similarly, genus Clostridium is most relevant to Topic 50, 153, 95 and genus Bacteroides is most relevant to Topic 156, 77,
36 Uncovered latent topics with respect to NCBI taxonomic indicators MH0001 Illustration of top-ranked latent topics with respect to different microbial samples p(topic sampl e) O2.UC-1 p(topic sampl e) V1.CD-1 p(topic sampl e) Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Topic Discoveries : the probability of Topic 0 in Healthy and UC samples (0.475 in MH0001 and in O2.UC-1) is much higher than that in CD samples (0.286 in V1.CD-1). This suggests that for CD samples, the proportion of bacteria belong to phylum Firmicutes is significantly reduced. The prevalence of Topic 95 and 52 in samples O2.UC-1 and sample V1.CD-1 1 may indicate the existence and possibly high abundance of genus Clostridium and genus Bacteroides, correspondingly. 36
37 Uncovered latent topics with respect to NCBI taxonomic indicators 37
38 Summary of Discoveries Our discoveries from the results is evidenced by the recent discoveries i in fecal microbiota study of inflammatory bowel disease (IBD) patients [Gerber, 2007], [Harry S. et. al. 2006], [Manichanh C et al., 2006], [Walker A. et. al. 2011]. It has been reported that there is a significant reduction in the proportion of bacteria belonging to phylum Firmicutes in CD samples, which is consistent with our results. This can be explained by the fact mucosal microbial diversity is reduced in IBDs, particular in CD, which is associated with bacterial invasion of the mucosa. In UC, the inflammation is typically more superficial; therefore, the reduction of phylum Firmicutes in UC is not significant. 38
39 Conclusions Based on the functional elements derived from the nonredundant CDs catalogue, we have shown that the configuration of functional groups encoded in the gene- expression data of meta-genome samples can be inferred by applying probabilistic topic modeling to functional elements derived from the non-redundant CDs catalogue. The latent topics estimated from human gut microbial samples are evidenced by the recent discoveries in fecal microbiota study, which demonstrate the effectiveness of the proposed method. 39
40 Future work In the proposed model, the number of functional group has to be specified in advance, or iteratively tuned by criteria such as log-likelihood and perplexity. In future work, we propose to use nonparametric hierarchical Bayesian models (such as HDP model) to handle the uncertainty in the number of functional groups, which provide the flexibility of modeling microbial sequences with unknown functional group numbers. 40
41 Questions? 41
42 Backup Slides 42
43 Mutual Information After estimating the topic model and assigning a latent topic to each functional element, the relevance between latent topics and functional element indicators (i.e. NCBI taxonomic level indicators, indicator of gene orthologous groups and KEGG pathway indicators) can be obtained by calculating l the mutual information (MI) between functional element indicators and obtained latent topics based on the final latent topic assignments to functional elements. pr ( g, Zt) MI( Rg, Zt) = p( Rg, Zt)log pr ( ) pz ( ) in which R g and Z t are binary indicator variables corresponding to the functional element and the latent topic, respectively. The variable pair (R g,z t ) indicates whether a latent topic has been assigned to a specific functional element. g t 43
44 Likelihood Comparison T p( w z) = p( zt, ϕz ) p( ϕ ) t z z t t dϕ ϕ zt w z t t = 1 T ( wi) ( wi) T Γ( Wβ) Γ ( nt + β ) ( ) ( 0 ) w Γ Wη Γ n + η i wi =.. W () W () ( β) Γ t= 1 Γ ( n t + Wβ) Γ( η) Γ ( n 0 + Wη) 44
45 Likelihood Comparison (continue) T p( w z) = p( zt, ϕz ) p( ϕ ) t z z t t dϕ ϕ zt w z t t = 1 T ( wi) ( wi) T Γ( Wβ) Γ ( nt + β ) ( ) ( 0 ) w Γ Wη Γ n + η i wi =.. W () W () ( β) Γ t= 1 Γ ( n t + Wβ) Γ( η) Γ ( n 0 + Wη) 45
46 Perplexity Comparison The perplexity is calculated for held-out testing data. In our experiment, we use a 50% subset of the functional elements as training data and the other 50% as testing data. On constructing the two subsets, we ensure that functional elements from the same sample are equally split to both subsets. In practice, it is the inverse predicted model likelihood of data in held-out testing data, using parameters inferred from the trained topic model. Thus the smaller perplexity value indicates better model fitting. perplexity( D ) = exp test log( p( w j )) j= 1 test Dtest t N j= 1 j D 46
47 Perplexity Comparison (continue) 47
48 Dirichlet Process (DP) as a Non-Parametric Mixture Models The Dirichlet Process (DP) is defined as a distribution of random probability measure G 0 ~ DP(γ, H), in which γ is a concentration parameter and H is a base measure defined on a sample space Θ. By its definition, for any finite measurable partition of Θ: {A 1,,A r }, (G 0 (A 1 ),,G, 0 (A r )) ~ Dirichlet(γ H(A ( 1 ),,, γ H(A ( r )). Dirichlet Process can also be constructed by stick-breaking construction as follows: G 0 k 1 = βδθ ( ) β (1 ), ~ (1, ) k k k = αk αi αk Beta γ i= 1 k = 1 Dirichlet process by its definition: Dirichlet process constructed by stick-breaking construction: - Data sample x i drawn from a base distribution with associated parameters Θ k The weights of mixture components β = {β k } (k=1,, ) are also refer to as β ~ GEM(γ).,in which 48
49 Hierarchical Dirichlet Process (HDP) The Hierarchical Dirichlet Process (HDP) considers G 0 ~ DP(γ, H) as a global probability measure across the corpora and defines a set of child random probability measures G j ~ DP(α 0, G 0 ) for each document j, which leads to different document-level distribution over semantic mixture components: (G j (A 1 ),,G j (A r )) ~ Dirichlet(α 0 G 0 (A 1 ),, α 0 G 0 (A r )) Each G j can also be constructed by stick-breaking construction as: G = π δθ ( ) j jk k k = 1 in whch π j ={π jk } (k=1,, ) specifies the weights of mixture component indicator k. Substitute the stick-breaking construction of G 0 and G j, it follows that: π jk,..., π jk ~ Dirichlet( α0 βk,..., α0 βk) k K1 k Kr k K1 k Kr Based on the aggregation properties of Dirichlet distribution and its connection with Beta distribution, it shows that: k 1 k π jk = π ' jk (1 π ' jl ), π ' jk ~ Beta α0βk, α0 1 βl l= 1 l= 1 It then follows that π j ~ DP(α 0, β) Stick-breaking construction of hierarchical Dirichlet process 49
Latent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationText mining and natural language analysis. Jefrey Lijffijt
Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationMiGA: The Microbial Genome Atlas
December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationMutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802
Mutual Information & Genotype-Phenotype Association Norman MacDonald January 31, 2011 CSCI 4181/6802 2 Overview What is information (specifically Shannon Information)? What are information entropy and
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationUser-Tagged Image Modeling
Perspective Hierarchical Dirichlet Process for User-Tagged Image Modeling Xin Chen 1, Xiaohua Hu 1, Yuan An 1, Zunyan Xiong 1, Tingting He 2, E.K. Park 3 1 College of Information Science and Technology,
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationComputational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem
University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's
More informationIstituto di Microbiologia. Università Cattolica del Sacro Cuore, Roma. Gut Microbiota assessment and the Meta-HIT program.
Istituto di Microbiologia Università Cattolica del Sacro Cuore, Roma Gut Microbiota assessment and the Meta-HIT program Giovanni Delogu 1 Most of the bacteria species living in the gut cannot be cultivated
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationComparative genomics: Overview & Tools + MUMmer algorithm
Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first
More informationFlow of Genetic Information
presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid
More informationSupplementary Information
Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationIdentifying Bacterial Strains with Sequencing Data using Probabilistic Models
Identifying Bacterial Strains with Sequencing Data using Probabilistic Models Helsinki Institute for Information Technology Department of Computer Science, University of Helsinki September 25, 2014 Motivation
More informationTaxonomical Classification using:
Taxonomical Classification using: Extracting ecological signal from noise: introduction to tools for the analysis of NGS data from microbial communities Bergen, April 19-20 2012 INTRODUCTION Taxonomical
More informationAssigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014
Assigning Taxonomy to Marker Genes Susan Huse Brown University August 7, 2014 In a nutshell Taxonomy is assigned by comparing your DNA sequences against a database of DNA sequences from known taxa Marker
More informationTaxonomy. Content. How to determine & classify a species. Phylogeny and evolution
Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION ARTICLE NUMBER: 16161 DOI: 10.1038/NMICROBIOL.2016.161 A reference gene catalogue of the pig gut microbiome Liang Xiao 1, Jordi Estellé 2, Pia Kiilerich 3, Yuliaxis Ramayo-Caldas
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More informationGEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationApplying LDA topic model to a corpus of Italian Supreme Court decisions
Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationMicrobiome: 16S rrna Sequencing 3/30/2018
Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationMETABOLIC PATHWAY PREDICTION/ALIGNMENT
COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501
More informationA. Incorrect! In the binomial naming convention the Kingdom is not part of the name.
Microbiology Problem Drill 08: Classification of Microorganisms No. 1 of 10 1. In the binomial system of naming which term is always written in lowercase? (A) Kingdom (B) Domain (C) Genus (D) Specific
More informationINTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA
INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology
More informationA Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,
More informationMicrobes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng
Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS Elizabeth Tseng Dept. of CSE, University of Washington Johanna Lampe Lab, Fred Hutchinson Cancer
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationMicrobial Taxonomy and the Evolution of Diversity
19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy
More informationUnderstanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007
Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More information- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.
NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a
More informationMicrobial analysis with STAMP
Microbial analysis with STAMP Conor Meehan cmeehan@itg.be A quick aside on who I am Tangents already! Who I am A postdoc at the Institute of Tropical Medicine in Antwerp, Belgium Mycobacteria evolution
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationSupplemental Materials
JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION, May 2013, p. 107-109 DOI: http://dx.doi.org/10.1128/jmbe.v14i1.496 Supplemental Materials for Engaging Students in a Bioinformatics Activity to Introduce Gene
More informationMicrobiota: Its Evolution and Essence. Hsin-Jung Joyce Wu "Microbiota and man: the story about us
Microbiota: Its Evolution and Essence Overview q Define microbiota q Learn the tool q Ecological and evolutionary forces in shaping gut microbiota q Gut microbiota versus free-living microbe communities
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationIntroduction To Machine Learning
Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, 2016 1 / 14 Expectation maximization
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationGenome Annotation Project Presentation
Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationDistributed ML for DOSNs: giving power back to users
Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationLesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression
13.4 Gene Regulation and Expression THINK ABOUT IT Think of a library filled with how-to books. Would you ever need to use all of those books at the same time? Of course not. Now picture a tiny bacterium
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More information2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms
Bioinformatics Advance Access published March 7, 2007 The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationApplying hlda to Practical Topic Modeling
Joseph Heng lengerfulluse@gmail.com CIST Lab of BUPT March 17, 2013 Outline 1 HLDA Discussion 2 the nested CRP GEM Distribution Dirichlet Distribution Posterior Inference Outline 1 HLDA Discussion 2 the
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationTopic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationNetworks & pathways. Hedi Peterson MTAT Bioinformatics
Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes
More informationUnderstanding Sequence, Structure and Function Relationships and the Resulting Redundancy
Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy many slides by Philip E. Bourne Department of Pharmacology, UCSD Agenda Understand the relationship between sequence,
More informationReading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype
Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed
More informationSharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationCHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON
PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationChapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign
Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University
More informationPachinko Allocation: DAG-Structured Mixture Models of Topic Correlations
: DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationAP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide
Name: Period: Date: AP Bio Module 6: Bacterial Genetics and Operons, Student Learning Guide Getting started. Work in pairs (share a computer). Make sure that you log in for the first quiz so that you get
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More information