Regularized non-negative matrix factorization for latent component discovery in heterogeneous methylomes

Size: px
Start display at page:

Download "Regularized non-negative matrix factorization for latent component discovery in heterogeneous methylomes"

Transcription

1 Regularized non-negative matrix factorization for latent component discovery in heterogeneous methylomes N. Vedeneev, P. Lutsik 2, M. Slawski 3, J. Walter, M. Hein Saarland University, Germany, 2 DKFZ, Germany, 3 George Mason University, USA Problem Setting: DNA methylation is one of the most important and studied marks in modern epigenetics. It plays a central role in numerous biological processes, such as stem cell differentiation, embryonic development and diseases like cancer []. Each (sub)-cell type has a characteristic methylation profile. For some tissue, e.g. blood, individual cell types can be isolated and their methlyation profile can be measured directly. However, for some tissue, e.g. brain, such cell separation is difficult and in many cases only methylation measurement of a full tissue is available which is a mixture of cell types. Reliable and accurate identification of the cell type-specific methylation patterns and their corresponding mixture proportions is of great benefit for any subsequent biological analysis. This can be seen as a blind deconvolution problem which we tackle using a problem-adapted form of a non-negative matrix factorization. In mammalian genomes, including human, DNA methylation mark occurs predominantly in the context of CpG dinucleotide sequence motifs (CpGs). While state of the art sequencingbased methods allow to obtain the complete DNA methylation profile, the cost- and labouroptimized approaches, such as the Infinium 450k and EPIC microarrays, cover a small, yet representative subset of CpGs in the human genome (m = and m = , respectively). For each CpG i of total m tagged ones and each subject j... n Infinium microarrays output two intensity measurements M ij and U ij proportional to the number of DNA molecules in which this CpG is methylated and unmethylated, respectively. We define N ij = M ij + U ij to be the total microarray intensity at each CpG. The data matrix D R m n, is the ratio D ij = M ij /N ij. We denote by r the total number of latent components (typically associated with cell types). In practice, as a rule m n > r. Regularized NMF: Let T R m r + be a matrix representing r latent components, or pure methylation profiles, and A R r n + a matrix of their proportions, or mixtures. Our experiments [6] suggest that a linear mixture model is plausible, i.e: D T A and columns of T are affinely independent. As on a single cell level methylation is binary (on or off) we modeled in the previous work [7] T to be a binary matrix in {0, } m r. While we have shown [7], that such a constraint implies uniqueness of the factorization beyond conditions such as separability [3], it turns out that this constraint is too rigid as even cells of the same cell type can have different methylation at a small fraction of the sites and thus have to be seen as a statistical mixture. In Figure we plot the histogram of measured methlyation values of an isolated cell type. While most of the measurements are concentrated close to zero and one, there is a certain fraction of intermediate DNA methylation. This motivated us in 29th Conference on Neural Information Processing Systems (NIPS 206), Barcelona, Spain.

2 current work [6], the method is called MeDeCom, to relax the constraint to T [0, ] m r and solve the following regularized NMF problem minimize T IR m r,a IR r n 2 D T A 2 F + λ subject to m i= j= T [0, ] m r, A R r n +, n T ij ( T ij ) r A ij =, j =,..., n. i= () The constraint on A is added so that one can interpret A ij as the proportion of latent component i in sample j. As we illustrate below the factorization without regularizer is highly non-unique as in this application we are far away from the separability condition. However, we know from the property of single cells that most methylation sites of a certain cell type should be close to 0 or. Thus we enforce by the non-convex regularization term the entries of T to satisfy this property. As Figure 2 shows, the regularizer is the key to more accurate estimation of both the matrix T and the proportions A as it biases the estimated latent components T towards 0 or. Note that the factorization of the data (blue dots) is highly non-unique as there exist infinitely many solutions with zero fit. However, the strong prior resolves this non-uniqueness. This example, although artificial, is a demonstration of a biologically relevant extreme hard case where the proportions of the cell types across samples are only varying very little and the methylation profiles of the identified cell types in T are highly correlated or, geometrically speaking, when the observations are compactly concentrated inside the simplex spanned by the columns of T and far away from some/all of its vertices/facets (which is exactly the opposite of the separability condition [3]). In instances like in Figure 2 we outperform [5], which is basically based on standard NMF, by large margin Figure : Histogram of T values Figure 2: Hard toy case: observations are deep inside the simplex and away from its boundaries. m = 2, r = 3 Contribution: In the workshop submission we build upon [6] and address two issues: we study families of regularizers which enforce the entries of T being close to zero or one and we examine how the influence of the highly varying number of cells measured at each site should influence the loss used in the NMF model. We solve the optimization problem (and the variants discussed below) via alternating optimization of T and A where we use DCA [8] for the non-convex problem in T. Regularizer: In a number of experiments we solved problem () and compared different regularizers like:. Regularizers enforcing a bias towards {0, } represented by a function family S α,β modeled as concave two-piece cubic splines with a junction at /2 such that S α,β = {s : [0, ] R s(0) = s() = 0, s( /2) = /4, s ( /2) = 0, s (0) = α, s () = β}, where α, β are user-defined parameters controlling slopes at 0 and. Constraints at /4 are in order to make regularizers from S α,β comparable with (). It is worth noting that 2

3 once α and β are specified, any regularizer s S α,β is uniquely defined (4 parameters per cubic piece with 2+2 equalities delivered by the values and derivatives at the end-points). 2. Regularizers derived from a prior based on the distribution of the methylation values in isolated cell types. Here we use the known correspondence between MAP estimation of a prior and a corresponding regularizer and fit a parametric model to the measured distribution. In our experiments the prior distribution was first fit using kernel methods, then the values of the negative log of the density estimate were further fit with polynomials of different degrees. The resulting functions were used as regularizers. We observed that the sensitivity of the regularization parameter, selected by cross-validation, is influenced by the choice of the regularizer. However, in experiments we concentrate on the effect of loss, as it turns out that different regularizers yield similar results given that the grid of λ-values is fine enough. Modified Loss for NMF: The squared loss [SL] in () is associated with a Gaussian noise model. However, as the data matrix consists of ratios of discrete measurements, this noise model might not be appropriate. In particular, the squared loss does not take into account that the number of measured cells N ij for each site i and sample j varies by orders of magnitude ( , this is a consequence of the measurement technique of the Infinium HumanMethylation450 BeadChip) and also does not consider the discrete nature of the measurements. Recall that M ij is the number of methylated cells in N ij measurements and each measurement is done for each cell with the same method. Therefore we interpret D ij as the sample mean of N ij independent Bernoulli trials with the sample variance estimate ˆσ ij 2 = Dij( Dij) N ij. Since N ij is relatively large, ˆσ ij 2 is a good estimate of the true variance σij 2. Furthermore, for large N ij the distribution of the sample mean D ij can be well approximated by a Gaussian distribution and thus we arrive at the noise model: D ij N ( r k= T ika kj, σij 2 ), i =,..., m, j =,..., n. The new loss (negative log likelihood of the noise distribution) is a weighted squared loss [WSL] given by: Λ (D T A) 2 F where Λ ij = ˆσ ij and is the Hadamard product. To avoid numerical problems we truncate Λ at the 0.95 quantile of its values distribution to account for very small variance estimates. We discuss the effect of this adapted loss compared to the standard squared loss. Real data experiment: We use well-known annotated data from the study [4], where brain cell nuclei from a single brain sample were separated for a neuron-specific marker NeuN using fluorescence activated cell sorting (FACS) and the obtained NeuN + (neuronal) and NeuN (non-neuronal) fractions were mixed subsequently into mixtures with proportions 0.0 : 0. :.0. The mixtures were then profiled using the Infinium 450K array. We use this dataset to uncover source NeuN +\ methylomes and their mixing proportions. In order to make the problem harder and more realistic we only use five mixtures corresponding to proportions 0.3 : 0. : 0.7. While the mixture ground-truth data is known, the reference profiles are not available and thus we use T ref - the average of reference profiles from 30 different patients. Thus our estimated error in the T matrix is only an approximation. The Infinium 450k microarray includes slightly more than primer extension assays of two different design types known as type I and type II probes. Methylation levels from two array types were shown to have diffent properties [2]. In order to exclude a bias in the data the analysis on the homogeneous subset of type I probes (comprising approximately one third of the 450k array) is performed. CpGs with probes that overlapped with annotated SNP positions (dbsnp32 entries with MAF> 0.05, as defined in the RnBeads.hg9 annotation) along the complete probe sequence were also discarded to eliminate the confounding genetic variability. This way the data was reduced to the matrix D R Furthermore, in a refined setting several additional layers of quality filtering were used, i.e. each methylation value was required to be supported by at least 5 Infinium beads and CpGs with extreme intensities (< 0. and > 0.95 quantiles of all intensity values) were also removed as potentially erroneous - this is how matrix D R is obtained. Both matrices D and D are factorized in order to assess the robustness of different loss 3

4 functions with respect to noise. We use leave one out cross-validation scheme across samples to determine the optimal regularization parameter. The set of possible λ-values is {0, α 0 k α {, 2.5, 5, 7.5}, k { 5, 4, 3, 2}}. For the weighted squared loss λ was further scaled by the median of the Λ matrix values distribution in order to make the range of employed regularization parameters comparable. Figure 3 shows how the estimation  of the true mixture matrix A is affected by the choice of the loss. Table below provides a quantitative evaluation. Proportion of NeuN Ground truth SL for raw Type I, lambda=e-03 WSL for raw Type I, lambda= SL for filtered Type I, lambda=5e-04 WSL for filtered Type I, lambda= Samples Figure 3: A (ground truth) versus the estimated  returned by different variants of the regularized algorithm after matching with T ref. The NeuN + proportions of all  matrices correspond to the ascending order that of A. As we can see SL underestimates the NeuN + proportions for all the samples for the case of the noisy raw Type data, whereas WSL is not seriously affected by the noise and provides a reasonable estimate of the proportions. The reason for the failure of SL in this case is that the cross-validation routine selects the wrong parameter, which might be due to the more noisy version of the error measure. Both SL and WSL produce very good estimates (2.5% resp..3%) for the filtered case, again WSL outperforms SL slightly. Thus we have shown in this experiment that integrating the information about the total intensity level and the variance information for each CpG site leads both to more robustness against noise but also to a more accurate estimation of the proportions. We also compared the ˆT estimates against T ref. However, as we do not have ground truth information on T but just the average of the reference profiles T ref, this comparison should be seen rather as a sanity check. The results are summarized in Table. We see that on average the estimation of the matrix T is good in all cases. As expected both WSL and SL perform better in the T estimation on the filtered data. The large maximal absolute error can be explained by the fact that T ref is not the true ground truth data and thus is correct only on average but not on the level of single CpG sites. rn  A  A rm ˆT T ref ˆT T ref SL, raw Type I WSL, raw Type I SL, filtered Type I WSL, filtered Type I Table : Comparison of estimates ( ˆT, Â) and T ref against A and T ref on the raw and filtered Type I dataset. Conclusions: The Infinium microarray is an efficient tool for measuring DNA methylations on the genome scale yet the data produced is contaminated by measurement noise and biases of various nature and origin. We addressed those issues by assessing the importance of biologically motivated regularizers and adequate noise modeling. In future work we will further study the good result of WSL for the neural data [4] on more datasets. 4

5 References [] C. Bock. Analysing and interpreting DNA methylation data. Nature Reviews Genetics, 3:705 79, 202. [2] S. Dedeurwaerder, M. Defrance, M. Bizet, E. Calonne, G. Bontempi, and F. Fuks. A comprehensive overview of Infinium HumanMethylation450 data processing. Brief. Bioinformatics, 5(6):929 94, Nov 204. [3] D. Donoho and V. Stodden. When does non-negative matrix factorization give a correct decomposition into parts? In NIPS, [4] J. Guintivano, M. J. Aryee, and Z. a. Kaminsky. A cell epigenotype specific model for the correction of brain cellular heterogeneity bias and its application to age, brain region and major depression. Epigenetics, 8(3): , 203. [5] E.A. Houseman, M.L. Kile, D.C. Christiani, T.A. Ince, K.T. Kelsey, and C.J. Marsit. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinformatics, 7(259), 206. [6] P. Lutsik, M. Slawski, G. Gasparoni, M. Hein, and J. Walter. MeDeCom discovers and quantifies latent components of heterogeneous methylomes. Submitted. [7] M. Slawski, M. Hein, and P. Lutsik. Matrix factorization with Binary Components. In NIPS, 203. [8] P.D. Tao and L.T.H. An. Difference of Convex Functions Optimization Algorithms (DCA) for Globally Minimizing Nonconvex Quadratic Forms on Euclidean Balls and Spheres. Oper. Res. Lett., 9(5):207 26, November

Measures of hydroxymethylation

Measures of hydroxymethylation Measures of hydroxymethylation Alla Slynko Axel Benner July 22, 2018 arxiv:1708.04819v2 [q-bio.qm] 17 Aug 2017 Abstract Hydroxymethylcytosine (5hmC) methylation is well-known epigenetic mark impacting

More information

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Statistical Methods for Analysis of Genetic Data

Statistical Methods for Analysis of Genetic Data Statistical Methods for Analysis of Genetic Data Christopher R. Cabanski A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements

More information

a Short Introduction

a Short Introduction Collaborative Filtering in Recommender Systems: a Short Introduction Norm Matloff Dept. of Computer Science University of California, Davis matloff@cs.ucdavis.edu December 3, 2016 Abstract There is a strong

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing 1 Data Preprocessing Normalization: the process of removing sampleto-sample variations in the measurements not due to differential gene expression. Bringing measurements from the different

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Full versus incomplete cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation

Full versus incomplete cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation IIM Joint work with Christoph Bernau, Caroline Truntzer, Thomas Stadler and

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

Estimation of linear non-gaussian acyclic models for latent factors

Estimation of linear non-gaussian acyclic models for latent factors Estimation of linear non-gaussian acyclic models for latent factors Shohei Shimizu a Patrik O. Hoyer b Aapo Hyvärinen b,c a The Institute of Scientific and Industrial Research, Osaka University Mihogaoka

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Non-Negative Factorization for Clustering of Microarray Data

Non-Negative Factorization for Clustering of Microarray Data INT J COMPUT COMMUN, ISSN 1841-9836 9(1):16-23, February, 2014. Non-Negative Factorization for Clustering of Microarray Data L. Morgos Lucian Morgos Dept. of Electronics and Telecommunications Faculty

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Bayesian Regression of Piecewise Constant Functions

Bayesian Regression of Piecewise Constant Functions Marcus Hutter - 1 - Bayesian Regression of Piecewise Constant Functions Bayesian Regression of Piecewise Constant Functions Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA,

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

An Introductory Course in Computational Neuroscience

An Introductory Course in Computational Neuroscience An Introductory Course in Computational Neuroscience Contents Series Foreword Acknowledgments Preface 1 Preliminary Material 1.1. Introduction 1.1.1 The Cell, the Circuit, and the Brain 1.1.2 Physics of

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Binary matrix completion

Binary matrix completion Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion

More information

Seminar Microarray-Datenanalyse

Seminar Microarray-Datenanalyse Seminar Microarray- Normalization Hans-Ulrich Klein Christian Ruckert Institut für Medizinische Informatik WWU Münster SS 2011 Organisation 1 09.05.11 Normalisierung 2 10.05.11 Bestimmen diff. expr. Gene,

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Low-Level Analysis of High- Density Oligonucleotide Microarray Data Low-Level Analysis of High- Density Oligonucleotide Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley UC Berkeley Feb 23, 2004 Outline

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Estimating terminal half life by non-compartmental methods with some data below the limit of quantification

Estimating terminal half life by non-compartmental methods with some data below the limit of quantification Paper SP08 Estimating terminal half life by non-compartmental methods with some data below the limit of quantification Jochen Müller-Cohrs, CSL Behring, Marburg, Germany ABSTRACT In pharmacokinetic studies

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Lecture 1: Systems of linear equations and their solutions

Lecture 1: Systems of linear equations and their solutions Lecture 1: Systems of linear equations and their solutions Course overview Topics to be covered this semester: Systems of linear equations and Gaussian elimination: Solving linear equations and applications

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Hidden Markov Models with Applications in Cell Adhesion Experiments. Ying Hung Department of Statistics and Biostatistics Rutgers University

Hidden Markov Models with Applications in Cell Adhesion Experiments. Ying Hung Department of Statistics and Biostatistics Rutgers University Hidden Markov Models with Applications in Cell Adhesion Experiments Ying Hung Department of Statistics and Biostatistics Rutgers University 1 Outline Introduction to cell adhesion experiments Challenges

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Single gene analysis of differential expression. Giorgio Valentini

Single gene analysis of differential expression. Giorgio Valentini Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Xiaoxu Han and Joseph Scazzero Department of Mathematics and Bioinformatics Program Department of Accounting and

More information

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty Practical Applications and Properties of the Exponentially Modified Gaussian (EMG) Distribution A Thesis Submitted to the Faculty of Drexel University by Scott Haney in partial fulfillment of the requirements

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Multi Omics Clustering. ABDBM Ron Shamir

Multi Omics Clustering. ABDBM Ron Shamir Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)

More information

Using Multiple Kernel-based Regularization for Linear System Identification

Using Multiple Kernel-based Regularization for Linear System Identification Using Multiple Kernel-based Regularization for Linear System Identification What are the Structure Issues in System Identification? with coworkers; see last slide Reglerteknik, ISY, Linköpings Universitet

More information

V003 How Reliable Is Statistical Wavelet Estimation?

V003 How Reliable Is Statistical Wavelet Estimation? V003 How Reliable Is Statistical Wavelet Estimation? J.A. Edgar* (BG Group) & M. van der Baan (University of Alberta) SUMMARY Well logs are often used for the estimation of seismic wavelets. The phase

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Causal Discovery by Computer

Causal Discovery by Computer Causal Discovery by Computer Clark Glymour Carnegie Mellon University 1 Outline 1. A century of mistakes about causation and discovery: 1. Fisher 2. Yule 3. Spearman/Thurstone 2. Search for causes is statistical

More information

Nonconvex penalties: Signal-to-noise ratio and algorithms

Nonconvex penalties: Signal-to-noise ratio and algorithms Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016 Decision-making, inference, and learning theory ECE 830 & CS 761, Spring 2016 1 / 22 What do we have here? Given measurements or observations of some physical process, we ask the simple question what do

More information

1 Differentiable manifolds and smooth maps. (Solutions)

1 Differentiable manifolds and smooth maps. (Solutions) 1 Differentiable manifolds and smooth maps Solutions Last updated: March 17 2011 Problem 1 The state of the planar pendulum is entirely defined by the position of its moving end in the plane R 2 Since

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information