Project topics for the course Special Course in Bioinformatics II: Machine Learning in Bioinformatics

Size: px
Start display at page:

Download "Project topics for the course Special Course in Bioinformatics II: Machine Learning in Bioinformatics"

Transcription

1 Project topics for the course Special Course in Bioinformatics II: Machine Learning in Bioinformatics Eric Bach, Céline Brouard, Anna Cichonska, Markus Heinonen, Huibin Shen, Juho Rousu March 27, Retention time prediction using kernel methods Eric Bach (eric.bach@aalto.fi) Background: In untargeted metabolomics studies complex biological sample with possibly thousands of molecules are encountered. Tandem mass spectrometry (MS/MS) is a widely used technique to extract patterns from biological samples to identify the molecules in it. However, the sensitivity of a mass spectrometer depends on the ability to reduce the complexity of the biological sample, e.g. to prevent MS/MS spectra representing more than one molecule. Liquid chromatography (LC) is a technique to do such complexity reduction. If a properly prepared biological sample is provided to a LC column the molecules in the sample will interact differently with the columns stationary phase. This makes the molecules separating as a function of time depending on their molecular properties. Some molecules are passing faster through the column than others. The time at which a molecule leaves the column is called the retention time. The retention time can serve as an orthogonal information for the metabolite identification, e.g. it can exclude molecular candidates which are expected to have a different retention time [Aic+15] or make distinction of diastereoisomers possible [SNV15]. Unfortunately, retention time measurements are only available for a small number of molecules and not comparable between different chromatographic systems. On the other hand, for example the set of molecular candidates for the identification of one molecule (given its MS/MS spectra) can possible contain thousands of molecules. Therefore, machine learning algorithms have been applied to predict retention times given the structure of a molecule [Aic+15; Fal+16]. 1

2 Goal: In this project the student will implement and apply two different kernelized regression approaches to predict the retention time of molecules given their structure. Methods and materials: For the project the student will be provided with a data set containing the retention time measurements for 596 molecules. The molecular descriptors and fingerprints will be given to the student. The student will implement the Kernel Ridge Regression (KRR) and the Magnitude-preserving kernel regression (MPKR) [CMR07]. The student will apply both approaches to predict the retention times for the molecular structures in the data set. The student will compare the performance of KRR and MPKR and investigate, whether the magnitude-preserving error term leads to better retention time prediction. Prerequisite: Basic knowledge of machine learning (especially kernel methods) & parameter estimation (i.e. cross-validation), linear algebra, programming skills in R, MATLAB or Python. Some basic knowledge of molecular biology and chemoinformatics is beneficial. [Aic+15] Fabian Aicheler et al. Retention Time Prediction Improves Identification in Nontargeted Lipidomics Approaches. In: Analytical chemistry (2015), pp [CMR07] Corinna Cortes et al. Magnitude-preserving Ranking Algorithms. In: Proceedings of the 24th International Conference on Machine Learning. ICML 07. ACM, url: [Fal+16] Federico Falchi et al. Kernel-Based, Partial Least Squares Quantitative Structure- Retention Relationship Model for UPLC Retention Time Prediction: A Useful Tool for Metabolite Identification. In: Analytical Chemistry (2016). [SNV15] Jan Stanstrup et al. PredRet: Prediction of Retention Time by Direct Mapping between Multiple Chromatographic Systems. In: Analytical Chemistry (2015). PMID: , pp url: 2

3 2 Metabolite identification from tandem mass spectra Céline Brouard Background: Metabolites are small molecules involved in the biological processes of organisms. Metabolite identification is an important problem in molecular biology. This problem consists in identifying the molecular structures of the unknown metabolites that are present in a biological sample. Information on these unknown metabolites can be obtained using tandem mass spectrometry. Recent progress in metabolite identification has been obtained using machine learning-based methods. Goal: The goal of this project is to implement the CSI:FingerID method described in the lecture. This method will be applied on the dataset used in the last CASMI 1 (Critical Assessment of Small Molecule Identification) contest. The idea of this contest is to evaluate different metabolite identification methods on a common dataset. A set of training examples is provided and for each given tandem mass spectrum, the correct molecular structure has to be determined among a set of potential molecular candidates. Materials and Methods: In this project, the student will implement the CSI:FingerID method. During the learning phase, the training MS/MS spectra are used to train a set of Support Vector Machine classifiers to predict molecular properties. The parameter C in SVM will be tuned using k-fold cross-validation on the training set, independently for each molecular property. In the prediction phase, the fingerprints of the unknown metabolites are predicted from their MS/MS spectra. The predicted fingerprints are then compared to fingerprints of candidate molecular structures for a best match. The training dataset contains 234 tandem mass spectra and the challenge dataset consists of 127 tandem mass spectra. A list of candidates is provided for each challenge spectrum. For each molecule, fingerprints have been retrieved from PubChem and OpenBabel. In input, kernels on tandem mass spectra will be provided. Required background knowledge/skills: Programming skills (preferably MATLAB, or R), basic knowledge of machine learning, understanding the basic principles of support vector machines. Some knowledge of molecular biology will be beneficial. [1] Heinonen, M., Shen, H., Zamboni, N., and Rousu, J. (2012). Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics, 28 (18): [2] Shen, H,, Dührkop, K., Böcker, S. and Rousu, J. (2014). Metabolite identification through 1 3

4 multiple kernel learning on fragmentation trees. Bioinformatics, 30(12):i157-i164. [3] Dührkop, K., Shen, H., Meusel, M., Rousu, J., and Böcker, S. (2015). Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proceedings of the National Academy of Sciences, 112(41): Multiple kernel learning for drug-protein interaction prediction Anna Cichonska (anna.cichonska@aalto.fi) Background: Drug-like chemical compounds execute their actions mainly by modulating cellular targets, such as proteins. Experimental determination of interactions between chemical compounds and protein targets is time consuming and expensive, and therefore, in the recent years, a lot of effort has been placed on the development of computational methods that could provide fast, large-scale and systematic pre-screening of chemical probes. In particular, a lot of work has been devoted to compound-based interaction prediction methods, including quantitative structure-activity relationship (QSAR) models, which aim to relate structural properties of the chemical molecules to their bioactivity profiles. Another class of computational methods, so called target-based methods, focus on evaluating similarities between amino acid sequences or three-dimensional structures of protein targets. In these supervised learning approaches, models are trained using available bioactivity data, together with either compound or protein information, which allows then predicting either new targets of a given drug or new drugs targeting a given protein. As a more recent class of computational modelling approaches, systems-based frameworks take advantage of the information available on both compounds and proteins. A key assumption is that similar drug compounds interact with similar proteins, and therefore a proper representation and use of similarities, equivalent to a kernel choice, is a first critical prerequisite for the achievement of high-quality drug-protein interaction (DPI) predictions. Classical kernel-based methods rely on a single kernel. However, such approaches are unlikely to be optimal when a growing variety of biological and molecular data sources become available simultaneously. Multiple kernel learning (MKL) methods, which search for an optimal combination of several kernels, enabling the use of different information sources simultaneously and learning their importance for the prediction task, are therefore receiving increasing attention. Typically, binary-valued DPI prediction setup is employed. However, molecular interactions are not simple on-off relationships and predicting real-valued binding affinities is more appealing. 4

5 Goal: The goal of the project is to compute several protein kernels as well as drug kernels, and then use them in MKL regression framework to predict drug-protein binding affinities. Materials and Methods: The data set consists of 50 drug compounds and 50 protein targets, which is a subset of the data from Metz et al. (2011) experimental study. DPIs are represented as real values reflecting how tightly a compound binds to a protein. The student will calculate Tanimoto kernels for drug compounds based on several fingerprints implemented in ChemmineR R package. For proteins, Smith-Waterman amino acid sequence alignment as well as Generic String kernel will be adopted. The student can also choose to compute other molecular descriptors. Then, pairwise kernels that directly relate drugprotein pairs will be constructed by taking Kronecker product of each pair of drug kernel and protein kernel. The student will use pairwise kernels with two-stage MKL algorithm ALIGNF. In the first stage, kernel mixture weights are determined based on maximising the centred alignment, i.e. matrix similarity measure, between the combined kernel and the ideal, socalled target kernel derived from the label values. In the second stage, combined kernel is used with Kernel Ridge Regression (KRR) as a prediction model. The student will be provided a script for calculating kernel mixture weights (first stage) but should implement KRR (second stage). UNIMKL algorithm will form a baseline model, where all kernel mixture weights are equal to 1/P, P being the number of input kernels. The student will implement nested cross validation to tune the regularisation parameters λ of KRR and asses the predictive performance of the model. Prerequisite: Programming skills (MATLAB, R, Python), basic knowledge of machine learning. Some knowledge of chemoinformatics will be beneficial. [1] Ding H, Takigawa I, Mamitsuka H, Zhu S. Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Briefings in Bioinformatics 2014; 15(5): [2] Cichonska A, Rousu J, Aittokallio T. Identification of drug candidates and repurposing opportunities through compoundtarget interaction networks. Expert Opinion on Drug Discovery 2015; 10(12): [3] Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 2008; 24(13): i [4] Giguere S, Marchand M, Laviolette F, Drouin A, Corbeil J. Learning a peptide-protein binding affinity predictor with kernel ridge regression. BMC Bioinformatics 2013; 14(1): 82. [5] Cortes C, Mohri M, Rostamizadeh A. Algorithms for learning kernels based on centered alignment. Journal of Machine Learning Research 2012; 13(Mar): [6] Metz JT, Johnson EF, Soni NB et al. Navigating the kinome. Nature Chemical Biology 2011; 7(4):

6 4 Differential gene expression analysis Markus Heinonen Background: In differential gene expression analysis statistical methods are applied to find which genes are over or under expressed with respect to control baseline expression levels. These results are subsequently analysed for biological significance by inspect the functional annotations of these genes to gain insight into cellular processes of interests. In static differential testing expression matrices are compared using well-defined statistics. In dynamic differential testing time series or interpolation models over time are compared using frequentist or Bayesian statistics. In both cases a large-scale view of the expression patterns of thousands of genes emerges. The key question in differential analysis is choice of model for the expression patterns. Genes commonly exhibit non-stationarity, where the underlying dynamics can change abruptly by perturbation or regulation. The sparse and often irregularly sampled data warrants careful modeling of the signals. Typically the underlying model family for interpolation and data representation are Gaussian processes. The differential expression can be tested against a constant level, between two conditions, or between multiple conditions. Goal: The goal of the project is to model gene expression time series with Gaussian processes and apply differential testing to find differentially regulated genes between conditions. Materials and methods: In this project the response of Botrytis infection on Arabidopsis plant gene expression is analysed. The gene expression time-series are modeled using Gaussian processes and two-sample interval testing is carried out to find out differentially expressed genes in the infection response, and when these genes are differentially expressed. The analysis results in a temporal cascade of gene differential expressions. The plant gene expression measurements are large-scale and of high quality with numerous biological and technical replicates. The data is located in the GEO database at The dataset consists 22 time points (2,4,..,48 hours) for infected and normal plant cells, for 4 biological replicates (plants) and 3 technical replicates for almost 10,000 gene probes. The GP modeling can be performed on any GP implementation (eg. gpml/gpstuff on Matlab, gptk/gpfit on R, pygp/gpy on Python). A suitable learning criteria, such as marginal likelihood or cross-validation should be used. An appropriate kernel prior should be chosen as well, with the Gaussian kernel being a common choice. A two-sample testing should be implemented according to the Bayesian EMLL framework (see slides). Finally, the differentially expressed genes can be studied by many ways. These include visualisation over time, clustering of their expression patterns, or by considering their functional classifications (such as GO terms, KEGG pathways, Inter- Pro families or PANTHER functional classification), which are found in several databases, for instance the DAVID and BioGPS web servers. Optionally, the student can experiment with non-stationary GPs, where the observation noise or signal variance is time dependent. The GPstuff package contains an implementation 6

7 of nonstationary GPs. The goal is to analyse which gene expression time series warrant a non-stationary GP, and to analyse the model improvement and runtime effects from adding non-stationarity [See Tolvanen et al 2014]. More detailed instructions will be available from the instructor. Required background knowledge/skills: Programming skills (Matlab, R, python), basic statistics, basic Bayesian statistics and machine learning. Some knowledge of biology will be useful. Heinonen et al (2015): Detecting time periods of differential gene expression using Gaussian processes: An application to endothelial cells exposed to radiotherapy dose fraction. Bioinformatics, 31: Rasmussen & Williams (2006): Gaussian processes for machine learning [sections 2, 4.2 and 5.4]. Windram et al (2012) Arabidopsis Defense against Botrytis cinerea: Chronology and Regulation Deciphered by High-Resolution Temporal Transcriptomic Analysis. The Plant Cell, 24: Stegle et al (2010): A Robust Bayesian Two-Sample Test for Detecting Intervals of Differential Gene Expression in Microarray Time Series. Journal of Computational Biology, 17: Tolvanen et al (2014). Expectation propagation for nonstationary heteroscedastic Gaussian process regression. in IEEE MLSP. 7

8 5 Learning molecular representation with an autoencoder Huibin Shen Background: The current representations of molecule including a binary vector representation such as molecular fingerprint, a string representation such as InChi or SMILES, or 2d/3d graph. Many applications related to molecules are based on some kind of representation. The popular deep learning is at the core to learn a better representation for the data. The number of molecules in nowadays compound database is in the scale of millions. With the heated deep learning approach, to learn a compact and continuous vector representation is possible. Goal: In this project, we will use an variational autoencoder to learn such representation and test the representation in a metabolite identification pipeline. We will first test the autoencoder on a subset of 5M molecules with fingerprint representation or SMILES string representation. The code and data is already available. The student will run the code on GPU nodes on triton. Prerequisite: Python and Basic knowledge about machine learning and deep learning. [1] Gómez-Bombarelli, R., Duvenaud, D., Hernndez-Lobato, J. M., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A. (2016). Automatic chemical design using a data-driven continuous representation of molecules. arxiv preprint arxiv: [2] Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014). A convolutional neural network for modelling sentences. arxiv preprint arxiv: [3] Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arxiv preprint arxiv:

Machine learning for ligand-based virtual screening and chemogenomics!

Machine learning for ligand-based virtual screening and chemogenomics! Machine learning for ligand-based virtual screening and chemogenomics! Jean-Philippe Vert Institut Curie - INSERM U900 - Mines ParisTech In silico discovery of molecular probes and drug-like compounds:

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Fast metabolite identification with Input Output Kernel Regression

Fast metabolite identification with Input Output Kernel Regression Bioinformatics doi.10.1093/bioinformatics/xxxxxx Advance Access Publication Date: Day Month Year Manuscript Category Fast metabolite identification with Input Output Kernel Regression Céline Brouard 1,2,,

More information

K-means-based Feature Learning for Protein Sequence Classification

K-means-based Feature Learning for Protein Sequence Classification K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Modelling gene expression dynamics with Gaussian processes

Modelling gene expression dynamics with Gaussian processes Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March th 6 Magnus Rattray Faculty of Life Sciences University of Manchester Talk Outline Introduction to Gaussian

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Computational Methods for Mass Spectrometry Proteomics

Computational Methods for Mass Spectrometry Proteomics Computational Methods for Mass Spectrometry Proteomics Eidhammer, Ingvar ISBN-13: 9780470512975 Table of Contents Preface. Acknowledgements. 1 Protein, Proteome, and Proteomics. 1.1 Primary goals for studying

More information

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

STRUCTURAL BIOINFORMATICS I. Fall 2015

STRUCTURAL BIOINFORMATICS I. Fall 2015 STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach Masahito Ohue 1,2,3,4*, akuro Yamazaki 3, omohiro Ban 4, and Yutaka Akiyama 1,2,3,4* 1 Department of

More information

#33 - Genomics 11/09/07

#33 - Genomics 11/09/07 BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33

More information

Learning Molecular Fingerprints from the Graph Up

Learning Molecular Fingerprints from the Graph Up Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, Ryan P. Adams Motivation Want

More information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics

More information

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery AtomNet A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Izhar Wallach, Michael Dzamba, Abraham Heifets Victor Storchan, Institute for Computational and

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Few-shot learning with KRR

Few-shot learning with KRR Few-shot learning with KRR Prudencio Tossou Groupe de Recherche en Apprentissage Automatique Départment d informatique et de génie logiciel Université Laval April 6, 2018 Prudencio Tossou (UL) Few-shot

More information

Machine learning methods to infer drug-target interaction network

Machine learning methods to infer drug-target interaction network Machine learning methods to infer drug-target interaction network Yoshihiro Yamanishi Medical Institute of Bioregulation Kyushu University Outline n Background Drug-target interaction network Chemical,

More information

The Success of Deep Generative Models

The Success of Deep Generative Models The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability

More information

Magnitude-Preserving Ranking for Structured Outputs

Magnitude-Preserving Ranking for Structured Outputs Proceedings of Machine Learning Research 77:407 422, 2017 ACML 2017 Magnitude-Preserving Ranking for Structured Outputs Céline Brouard 1 celine.brouard@aalto.fi Eric Bach 1 eric.bach@aalto.fi Sebastian

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen Bayesian Hierarchical Classification Seminar on Predicting Structured Data Jukka Kohonen 17.4.2008 Overview Intro: The task of hierarchical gene annotation Approach I: SVM/Bayes hybrid Barutcuoglu et al:

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Structure-Activity Modeling - QSAR. Uwe Koch

Structure-Activity Modeling - QSAR. Uwe Koch Structure-Activity Modeling - QSAR Uwe Koch QSAR Assumption: QSAR attempts to quantify the relationship between activity and molecular strcucture by correlating descriptors with properties Biological activity

More information

Compounding insights Thermo Scientific Compound Discoverer Software

Compounding insights Thermo Scientific Compound Discoverer Software Compounding insights Thermo Scientific Compound Discoverer Software Integrated, complete, toolset solves small-molecule analysis challenges Thermo Scientific Orbitrap mass spectrometers produce information-rich

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction

More information

De Novo molecular design with Deep Reinforcement Learning

De Novo molecular design with Deep Reinforcement Learning De Novo molecular design with Deep Reinforcement Learning @olexandr Olexandr Isayev, Ph.D. University of North Carolina at Chapel Hill olexandr@unc.edu http://olexandrisayev.com About me Ph.D. in Chemistry

More information

Introduction to Chemoinformatics and Drug Discovery

Introduction to Chemoinformatics and Drug Discovery Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Discriminating precursors of common fragments for large-scale metabolite profiling by triple quadrupole mass spectrometry

Discriminating precursors of common fragments for large-scale metabolite profiling by triple quadrupole mass spectrometry Bioinformatics, 31(12), 2015, 2017 2023 doi: 10.1093/bioinformatics/btv085 Advance Access Publication Date: 16 February 2015 Original Paper Systems biology Discriminating precursors of common fragments

More information

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what

More information

Support vector machines, Kernel methods, and Applications in bioinformatics

Support vector machines, Kernel methods, and Applications in bioinformatics 1 Support vector machines, Kernel methods, and Applications in bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group Machine Learning in Bioinformatics conference,

More information

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Navigation in Chemical Space Towards Biological Activity Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Data Explosion in Chemistry CAS 65 million molecules CCDC 600 000 structures

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research profileanalysis Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research Innovation with Integrity Omics Research Biomarker Discovery Made Easy by ProfileAnalysis

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

arxiv: v1 [stat.ml] 6 Dec 2018

arxiv: v1 [stat.ml] 6 Dec 2018 missiwae: Deep Generative Modelling and Imputation of Incomplete Data arxiv:1812.02633v1 [stat.ml] 6 Dec 2018 Pierre-Alexandre Mattei Department of Computer Science IT University of Copenhagen pima@itu.dk

More information

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1 BST 226 Statistical Methods for Bioinformatics David M. Rocke January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1 Mass Spectrometry Mass spectrometry (mass spec, MS) comprises a set of instrumental

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt ACS Spring National Meeting. COMP, March 16 th 2016 Matthew Segall, Peter Hunt, Ed Champness matt.segall@optibrium.com Optibrium,

More information

Efficient Complex Output Prediction

Efficient Complex Output Prediction Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay

More information

Xia Ning,*, Huzefa Rangwala, and George Karypis

Xia Ning,*, Huzefa Rangwala, and George Karypis J. Chem. Inf. Model. XXXX, xxx, 000 A Multi-Assay-Based Structure-Activity Relationship Models: Improving Structure-Activity Relationship Models by Incorporating Activity Information from Related Targets

More information

Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine

Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine Y. Dak Hazirbaba, J. Tezcan, Q. Cheng Southern Illinois University Carbondale, IL, USA SUMMARY: The 2009

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

SUSPECT AND NON-TARGET SCREENING OF ORGANIC MICROPOLLUTANTS IN WASTEWATER THROUGH THE DEVELOPMENT OF A LC-HRMS BASED WORKFLOW

SUSPECT AND NON-TARGET SCREENING OF ORGANIC MICROPOLLUTANTS IN WASTEWATER THROUGH THE DEVELOPMENT OF A LC-HRMS BASED WORKFLOW SUSPECT AND NON-TARGET SCREENING OF ORGANIC MICROPOLLUTANTS IN WASTEWATER THROUGH THE DEVELOPMENT OF A LC-HRMS BASED WORKFLOW Pablo Gago-Ferrero Laboratory of Analytical Chemistry Department of Chemistry

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH Ashutosh Kumar Singh 1, S S Sahu 2, Ankita Mishra 3 1,2,3 Birla Institute of Technology, Mesra, Ranchi Email: 1 ashutosh.4kumar.4singh@gmail.com,

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Correlation Autoencoder Hashing for Supervised Cross-Modal Search Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia

More information

Chemical Data Retrieval and Management

Chemical Data Retrieval and Management Chemical Data Retrieval and Management ChEMBL, ChEBI, and the Chemistry Development Kit Stephan A. Beisken What is EMBL-EBI? Part of the European Molecular Biology Laboratory International, non-profit

More information

Agilent METLIN Personal Metabolite Database and Library MORE CONFIDENCE IN COMPOUND IDENTIFICATION

Agilent METLIN Personal Metabolite Database and Library MORE CONFIDENCE IN COMPOUND IDENTIFICATION Agilent METLIN Personal Metabolite Database and Library MORE CONFIDENCE IN COMPOUND IDENTIFICATION COMPOUND IDENTIFICATION AT YOUR FINGERTIPS Compound identifi cation is a key element in untargeted metabolomics

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Supervised Machine Learning: Learning SVMs and Deep Learning. Klaus-Robert Müller!!et al.!!

Supervised Machine Learning: Learning SVMs and Deep Learning. Klaus-Robert Müller!!et al.!! Supervised Machine Learning: Learning SVMs and Deep Learning Klaus-Robert Müller!!et al.!! Today s Tutorial Machine Learning introduction: ingredients for ML Kernel Methods and Deep networks with explaining

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information