Compositional data methods for microbiome studies
|
|
- Augustus Heath
- 5 years ago
- Views:
Transcription
1 Compositional data methods for microbiome studies M.Luz Calle Dept. of Biosciences, UVic-UCC 1
2 Important role of the microbiome in human health 2
3 Microbiome and HIV How the gut microbiome affects inmune reconstitution, HIV-1 replication and chronic inflammation in HIV-1 infected individuals. Dynamics of microbiome and the inflammatory response after HIV infection. How the human microbiome can influence the AIDS vaccine response. 3
4 Outline 1. Why a new algorithm for microbiome analysis is needed? 2. Present "SelBal: Selection of Balances", a new algorithm for microbiome differential abundance testing 4
5 Microbiome study 5
6 OTU: Operational Taxonomic Unit and Taxonomy assignment Sequences that are highly similar (e.g. 97%) are clustered together into OTUs which are used in place of microbial species. OTU1 OTU2 OTU3 OTU4 OTUs 6
7 OTU table or Abundance table Taxon1 Taxon2... TaxonM OTU1 OTU2 OTU3... OTUK TOTAL Sample1 X 11 X 12 X X 1k N 1 Sample2 X 21 X 22 X X 2k N Samplep X p1 X p2 X p3... X pk N p 7
8 Microbiome differential abundance testing Multivariate analysis: Are there global differences in microbial composition between sample groups? Adonis=PERMANOVA Univariate testing: Which taxa are differentially abundant between sample groups? Wilcoxon, DESeq2, EdgeR,... 8
9 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in observed abundances of the other taxa 9
10 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in observed abundances of the other taxa 10
11 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in observed abundances of the other taxa Significant results: Wilcoxon: "TAXA_1" "TAXA_2" "TAXA_3" "TAXA_4" "TAXA_5" DESeq2: "TAXA_1" "TAXA_2" "TAXA_3" "TAXA_4" "TAXA_5" edger: "TAXA_1" "TAXA_2" "TAXA_3" "TAXA_4" "TAXA_5" 11
12 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in observed abundances of the other taxa Significant results: Wilcoxon: "TAXA_1" "TAXA_2" "TAXA_3" "TAXA_4" "TAXA_5" DESeq2: "TAXA_1" "TAXA_2" "TAXA_3" "TAXA_4" "TAXA_5" edger: "TAXA_1" "TAXA_2" "TAXA_3" "TAXA_4" "TAXA_5" Univariate tests for compositional data: many significant findings are False Positive 12
13 Microbiome Compositional data 13
14 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in the observed abundances of the other taxa 14
15 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in the observed abundances of the other taxa If taxon 1 relative abundance changes from π 1 to π 1 we will observe the other taxa relative abundances to change by a constant factor F = (1 π 1 )/(1 π 1 ) π j = π j F, F = 1 π 1 1 π 1 for j 1 15
16 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in the observed abundances of the other taxa If taxon 1 relative abundance changes from π 1 to π 1 we will observe the other taxa relative abundances to change by a constant factor F = (1 π 1 )/(1 π 1 ) π j = π j F, F = 1 π 1 1 π 1 for j 1 or a constant shift S = log(1 π 1 )/(1 π 1 ) in log-relative abundances: log (π j ) = log(π j ) + S, S = log ( 1 π 1 ) for j 1 1 π 1 16
17 Microbiome Compositional data Microbiome data is compositional: the change in abundance of one taxon induces changes in the observed abundances of the other taxa If taxon 1 relative abundance changes from π 1 to π 1 we will observe the other taxa relative abundances to change by a constant factor F = (1 π 1 )/(1 π 1 ) π j = π j F, F = 1 π 1 1 π 1 for j 1 or a constant shift S = log(1 π 1 )/(1 π 1 ) in log-relative abundances: log (π j ) = log(π j ) + S, S = log ( 1 π 1 ) for j 1 1 π 1 In the toy example: π 1 = > π 1 = > F = (1 π 1 )/(1 π 1 ) = 1/2 17
18 Microbiome Compositional data HPylori before π HPylori after π Shift in log-relative abundance: S = log(1 π 1 ) log(1 π 1 ) 4 18
19 Compositional data: log-ratio analysis Let X = (X 1, X 2,, X k ) be a composition of microbiome abundances. CODA: Analyze log-ratios between taxa: log (X i /X j ) = log (π i /π j ) Toy example: only log-ratios that involve taxa1 are different: log(x A 1 /X A 2 ) log(x B 1 /X B 2 ) log(0.2/0.2) log(0.6/0.1)... log(x A 2 /X A 3 ) = log(x B 2 /X B 3 ) log(0.2/0.2) = log(0.1/0.1)... 19
20 Compositional data: log-ratio analysis Let X = (X 1, X 2,, X k ) be a composition of microbiome abundances. CODA: Analyze log-ratios between taxa: log (X i /X j ) = log (π i /π j ) Toy example: only log-ratios that involve taxa1 are different: log(x A 1 /X A 2 ) log(x B 1 /X B 2 ) log(0.2/0.2) log(0.6/0.1)... log(x A 2 /X A 3 ) = log(x B 2 /X B 3 ) log(0.2/0.2) = log(0.1/0.1)... 20
21 Compositional balances: a new perspective for microbiome analysis Javier Rivera, PhD thesis Let X = (X 1, X 2,, X k ) be a composition of microbiome abundances. Instead of individual abundances, we analyze relative abundances between groups of taxa: Compositional balances Extension of the concept of log-ratio between two taxa: log (X i /X j ) = log (π i /π j ) Let's X + and X two disjoint subsets of components in X. The balance between X + and X is defined as: B = k + k log ( k + + k 1 i I X + i) 1 j I ) ( X j k + k 1 k i I+ log X i 1 + k j I log X j 21
22 Selbal: an algorithm for selection of balances Y, response variable, numeric or dichotomous, X = (X 1, X 2,, X k ) compositioin Z = (Z 1, Z 2,, Z r ) covariates Goal: to determine the sub-compositions X + and X so that the balance B between X + and X is highly associated with Y after adjustment for Z For a continuous variable Y: For a dichotomous variable Y: Y = β 0 + β 1 B + γ Z logit(y) = β 0 + β 1 B + γ Z 22
23 Selbal: an algorithm for selection of balances STEP 0: Zero replacement STEP 1: Optimal balance between two components, B (1) The algorithm evaluates exhaustively all possible balances between two components: B = 1 (log(x 2 i) log (X j )) for i, j {1,..., k} i j. STEP s: Optimal balance adding a new component For s > 1 and given B (s 1), the algorithm evaluates the optimization criterion of the balance that is obtained by adding log(x p ) to B (s 1), for each variable X p that has not been included previously 23
24 B (s 1) M + (s 1) M (s 1) = 1 k + (s 1) log (X i ) i I + (s 1) 1 k (s 1) log (X j ) j I (s 1) B (s+) = (k (s 1) (s 1) + +1) k (s 1) (s 1) ( k (s 1) (s 1) + M+ +log (Xp ) (s 1) M (s 1) ), k + +k +1 k + +1 B (s ) = k (s 1) (s 1) (s 1) + (k +1) (s 1) k (s 1) (s 1) (M k + +k +1 + M (s 1) + log (X p ) (s 1) ), k +1 and selects B (s) that maximizes the optimization criterion (R 2, AUC). STOP criterion: cross-validation 24
25 Cross-validation: selbal.cv Goals: (1) to identify the optimal number of components to be included in the balance (2) to explore the robustness of the global balance identified with the whole dataset. 25
26 Crohn s disease Ren et al. 2015: 662 patients with Crohn s disease and 313 controls. Abundance data at genus level (48 genera) 26
27 27
28 AUC = and cv-auc =
29 Comparison with other methods METHOD Median number of taxa Mean cv-auc selbal DESeq edger ANCOM ALDEx
30 Conclusions The compositonal nature of microbiome data should not be ignored This applies not only to microbiome abundance but aslo to gene counts in microbiome functional analysis. Working with relative abundances among groups of taxa (compositional balances) overcomes the problem of differences in sample size. The algorithm performs forward selection (suboptimal). We are working to develop a new algorithm that finds the optimal balance through penalized regression (LASSO) for compositional data. 30
31 Javier Rivera Marc Noguera Roger Paredes and the MetaHIV group Vera Pawlowsky-Glahn Juan José Egozcue CODA group 31
32 Effects of "closing" compositional data "Closing" compositional data (proportions or rarefaction) induces spurious correlation (Pearson 1896): Two or more variables will be negatively correlated simply because the data are transformed to have a constant sum x = [ ] cor(x) = [ ], cor(π 0.28 x ) = [ ] also induces subcompositional incoherences in both, correlations and distances. 32
33 Statistical challenges of microbiome analysis Sparsity: large proportion of zeros in OTU Multivariate with complex phylogenetic structure High dimensional Compositional data 33
34 Compositional data Let's consider a vector of K positive components or parts x = (x 1, x 2,, x K ) Closed compositional data describe a data set in which the parts in each sample have a constant sum: x i = 1 Compositional data describe a data set in which the parts in each sample have an arbitrary or noninformative sum 34
35 Microbiome Compositional data Microbiome data is compositional: o Row abundances (counts) are not informative: large variability in the total number of counts per sample and total number of counts is related to the instrument (sampling depth), not to microbiome abundance in the environment o Relative abundances (proportions) and rarefaction are used to obtain a closed microbiome composition this may induce strong incoherencies in correlations and distances 35
36 36
Statistical methods for the analysis of microbiome compositional data in HIV studies
1/ 56 Statistical methods for the analysis of microbiome compositional data in HIV studies Javier Rivera Pinto November 30, 2018 Outline 1 Introduction 2 Compositional data and microbiome analysis 3 Kernel
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More informationPackage milineage. October 20, 2017
Type Package Package milineage October 20, 2017 Title Association Tests for Microbial Lineages on a Taxonomic Tree Version 2.0 Date 2017-10-18 Author Zheng-Zheng Tang Maintainer Zheng-Zheng Tang
More informationLecture 2: Descriptive statistics, normalizations & testing
Lecture 2: Descriptive statistics, normalizations & testing From sequences to OTU table Sequencing Sample 1 Sample 2... Sample N Abundances of each microbial taxon in each of the N samples 2 1 Normalizing
More informationNiche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016
Niche Modeling Katie Pollard & Josh Ladau Gladstone Institutes UCSF Division of Biostatistics, Institute for Human Genetics and Institute for Computational Health Science STAMPS - MBL Course Woods Hole,
More informationCoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;
CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.
More informationOutline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?
Species Divergence and the Measurement of Microbial Diversity Cathy Lozupone University of Colorado, Boulder. Washington University, St Louis. Outline Classes of diversity measures α vs β diversity Quantitative
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationAn Adaptive Association Test for Microbiome Data
An Adaptive Association Test for Microbiome Data Chong Wu 1, Jun Chen 2, Junghi 1 Kim and Wei Pan 1 1 Division of Biostatistics, School of Public Health, University of Minnesota; 2 Division of Biomedical
More informationTaxonomy. Content. How to determine & classify a species. Phylogeny and evolution
Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature
More informationNEGATIVE BINOMIAL MODELLING AND APPLICATIONS FOR MICROBIOME COUNT DATA
NEGATIVE BINOMIAL MODELLING AND APPLICATIONS FOR MICROBIOME COUNT DATA by Chang Chen Submitted in partial fulfillment of the requirements for the degree of Master of Science at Dalhousie University Halifax,
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationarxiv: v1 [stat.ap] 23 May 2013
The Annals of Applied Statistics 2013, Vol. 7, No. 1, 418 442 DOI: 10.1214/12-AOAS592 c Institute of Mathematical Statistics, 2013 arxiv:1305.5355v1 [stat.ap] 23 May 2013 VARIABLE SELECTION FOR SPARSE
More informationLARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018
LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationMicrobiota: Its Evolution and Essence. Hsin-Jung Joyce Wu "Microbiota and man: the story about us
Microbiota: Its Evolution and Essence Overview q Define microbiota q Learn the tool q Ecological and evolutionary forces in shaping gut microbiota q Gut microbiota versus free-living microbe communities
More informationLecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)
Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc
More informationMicrobial analysis with STAMP
Microbial analysis with STAMP Conor Meehan cmeehan@itg.be A quick aside on who I am Tangents already! Who I am A postdoc at the Institute of Tropical Medicine in Antwerp, Belgium Mycobacteria evolution
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationMicrobiome: 16S rrna Sequencing 3/30/2018
Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationRobust statistics. Michael Love 7/10/2016
Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for
More informationSparse Proteomics Analysis (SPA)
Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015
More informationSystems biology. Abstract
Bioinformatics, 31(10), 2015, 1607 1613 doi: 10.1093/bioinformatics/btu855 Advance Access Publication Date: 6 January 2015 Original Paper Systems biology Selection of models for the analysis of risk-factor
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationIncorporating published univariable associations in diagnostic and prognostic modeling
Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands
More informationLinear Regression. Volker Tresp 2018
Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w
More informationarxiv: v1 [stat.ml] 29 Jul 2016
The Phylogenetic LASSO and the Microbiome Stephen T Rush Christine H Lee Washington Mio Peter T Kim arxiv:1607.08877v1 [stat.ml] 29 Jul 2016 Abstract Scientific investigations that incorporate next generation
More informationNemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014
Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of
More informationMicrobes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng
Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS Elizabeth Tseng Dept. of CSE, University of Washington Johanna Lampe Lab, Fred Hutchinson Cancer
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationarxiv: v2 [stat.me] 16 Jun 2011
A data-based power transformation for compositional data Michail T. Tsagris, Simon Preston and Andrew T.A. Wood Division of Statistics, School of Mathematical Sciences, University of Nottingham, UK; pmxmt1@nottingham.ac.uk
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationExtended Bayesian Information Criteria for Model Selection with Large Model Spaces
Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationStatistics for Differential Expression in Sequencing Studies. Naomi Altman
Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationMining Imperfect Data
Mining Imperfect Data Dealing with Contamination and Incomplete Records Ronald K. Pearson ProSanos Corporation Harrisburg, Pennsylvania and Thomas Jefferson University Philadelphia, Pennsylvania siam.
More informationMethodological Concepts for Source Apportionment
Methodological Concepts for Source Apportionment Peter Filzmoser Institute of Statistics and Mathematical Methods in Economics Vienna University of Technology UBA Berlin, Germany November 18, 2016 in collaboration
More informationSupplementary Information
Supplementary Information Table S1. Per-sample sequences, observed OTUs, richness estimates, diversity indices and coverage. Samples codes as follows: YED (Young leaves Endophytes), MED (Mature leaves
More informationMissing Covariate Data in Matched Case-Control Studies
Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with
More informationBacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria
Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationarxiv: v1 [math.st] 4 Mar 2019
Noname manuscript No. (will be inserted by the editor) Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications Patrick L. Combettes
More informationNormalization of metagenomic data A comprehensive evaluation of existing methods
MASTER S THESIS Normalization of metagenomic data A comprehensive evaluation of existing methods MIKAEL WALLROTH Department of Mathematical Sciences CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG
More informationEstimating subgroup specific treatment effects via concave fusion
Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion
More informationCLASSIFICATION UNIT GUIDE DUE WEDNESDAY 3/1
CLASSIFICATION UNIT GUIDE DUE WEDNESDAY 3/1 MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY 2/13 2/14 - B 2/15 2/16 - B 2/17 2/20 Intro to Viruses Viruses VS Cells 2/21 - B Virus Reproduction Q 1-2 2/22 2/23
More informationMultiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More information1. HyperLogLog algorithm
SUPPLEMENTARY INFORMATION FOR KRAKENHLL (BREITWIESER AND SALZBERG, 2018) 1. HyperLogLog algorithm... 1 2. Database building and reanalysis of the patient data (Salzberg, et al., 2016)... 7 3. Enabling
More informationTwo-sample tests of high-dimensional means for compositional data
Biometrika (208, 05,,pp. 5 32 doi: 0.093/biomet/asx060 Printed in Great Britain Advance Access publication 3 November 207 Two-sample tests of high-dimensional means for compositional data BY YUANPEI CAO
More informationOverview. and data transformations of gene expression data. Toy 2-d Clustering Example. K-Means. Motivation. Model-based clustering
Model-based clustering and data transformations of gene expression data Walter L. Ruzzo University of Washington UW CSE Computational Biology Group 2 Toy 2-d Clustering Example K-Means? 3 4 Hierarchical
More informationStatistical tests for differential expression in count data (1)
Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationTest Bank for Microbiology A Systems Approach 3rd edition by Cowan
Test Bank for Microbiology A Systems Approach 3rd edition by Cowan Link download full: http://testbankair.com/download/test-bankfor-microbiology-a-systems-approach-3rd-by-cowan/ Chapter 1: The Main Themes
More informationTHE CLOSURE PROBLEM: ONE HUNDRED YEARS OF DEBATE
Vera Pawlowsky-Glahn 1 and Juan José Egozcue 2 M 2 1 Dept. of Computer Science and Applied Mathematics; University of Girona; Girona, SPAIN; vera.pawlowsky@udg.edu; 2 Dept. of Applied Mathematics; Technical
More informationInteraction networks shed light on the ecology and evolution of soil microbiomes. Linda Kinkel Department of Plant Pathology University of Minnesota
Interaction networks shed light on the ecology and evolution of soil microbiomes Linda Kinkel Department of Plant Pathology University of Minnesota Soil Health: Disease suppression How do we measure soil
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationAmplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc
Amplicon Sequencing Dr. Orla O Sullivan SIRG Research Fellow Teagasc What is Amplicon Sequencing? Sequencing of target genes (are regions of ) obtained by PCR using gene specific primers. Why do we do
More informationSpecies richness estimation with high diversity but spurious singletons
Species richness estimation with high diversity but spurious singletons Amy Willis arxiv:604.02598v [stat.me] 9 Apr 206 Informal note from the author The method described in this paper has been available
More informationCharacterizing and predicting cyanobacterial blooms in an 8-year
1 2 3 4 5 Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing time-course Authors Nicolas Tromas 1*, Nathalie Fortin 2, Larbi Bedrani 1, Yves Terrat 1, Pedro Cardoso 4,
More informationABTEKNILLINEN KORKEAKOULU
Two-way analysis of high-dimensional collinear data 1 Tommi Suvitaival 1 Janne Nikkilä 1,2 Matej Orešič 3 Samuel Kaski 1 1 Department of Information and Computer Science, Helsinki University of Technology,
More informationRegression with Compositional Response. Eva Fišerová
Regression with Compositional Response Eva Fišerová Palacký University Olomouc Czech Republic LinStat2014, August 24-28, 2014, Linköping joint work with Karel Hron and Sandra Donevska Objectives of the
More informationBackward Genotype-Trait Association. in Case-Control Designs
Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,
More informationGuarding against Spurious Discoveries in High Dimension. Jianqing Fan
in High Dimension Jianqing Fan Princeton University with Wen-Xin Zhou September 30, 2016 Outline 1 Introduction 2 Spurious correlation and random geometry 3 Goodness Of Spurious Fit (GOSF) 4 Asymptotic
More informationA Poisson-multivariate normal hierarchical model for measuring microbial conditional independence networks from metagenomic count data
A Poisson-multivariate normal hierarchical model for measuring microbial conditional independence networks from metagenomic count data Surojit Biswas 1, Derek S. Lundberg 2, Jeffery L. Dangl 2,3,4,5, Vladimir
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationDifferential Expression Analysis Techniques for Single-Cell RNA-seq Experiments
Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments for the Computational Biology Doctoral Seminar (CMPBIO 293), organized by N. Yosef & T. Ashuach, Spring 2018, UC Berkeley
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k
More informationSTATISTICAL LEARNING OF INTEGRATIVE ANALYSIS. Meilei Jiang
STATISTICAL LEARNING OF INTEGRATIVE ANALYSIS Meilei Jiang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationPhylofactorization - theory and challenges
Phylofactorization - theory and challenges Alex D. Washburne 1 1 Montana State University; alex.d.washburne@gmail.com Abstract Data from biological communities are composed of species connected by the
More informationDegenerate Expectation-Maximization Algorithm for Local Dimension Reduction
Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationTesting for group differences in brain functional connectivity
Testing for group differences in brain functional connectivity Junghi Kim, Wei Pan, for ADNI Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Banff Feb
More informationPart IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation
Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples
More informationBiologists use a system of classification to organize information about the diversity of living things.
Section 1: Biologists use a system of classification to organize information about the diversity of living things. K What I Know W What I Want to Find Out L What I Learned Essential Questions What are
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationIntroductory compositional data (CoDa)analysis for soil
Introductory compositional data (CoDa)analysis for soil 1 scientists Léon E. Parent, department of Soils and Agrifood Engineering Université Laval, Québec 2 Definition (Aitchison, 1986) Compositional data
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationTransforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures Isabella Zwiener 1,2 *, Barbara Frisch 2, Harald Binder 2 1 Center for Thrombosis and Hemostasis (CTH), University Medical
More informationVariable Selection for Multivariate Models
Variable Selection for Multivariate Models Myth and Reality Kurt VARMUZA Vienna University of Technology Department of Statistics and Probability Theory Laboratory for ChemoMetrics www.lcm.tuwien.ac.at/vk/
More information