A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction
|
|
- Todd Wood
- 5 years ago
- Views:
Transcription
1 A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction Jakob V. Hansen Department of Computer Science, University of Aarhus Ny Munkegade, Bldg. 540, DK-8000 Aarhus C, Denmark Anders Krogh Center for Biological Sequence Analysis, Technical University of Denmark Building 208, DK-2800 Lyngby, Denmark Abstract Ensemble methods, which combine several classifiers, have been successfully applied to decrease generalization error of machine learning methods. For most ensemble methods the ensemble members are combined by weighted summation of the output, called the linear average predictor. The logarithmic opinion pool ensemble method uses a multiplicative combination of the ensemble members, which treats the outputs of the ensemble members as independent probabilities. The advantage of the logarithmic opinion pool is the connection to the Kullback- Leibler error function, which can be decomposed into two terms: An average of the error of the ensemble members, and the ambiguity. The ambiguity is independent of the target function, and can be estimated using unlabeled data. The advantage of the decomposition is that an unbiased estimate of the generalization error of the ensemble can be obtained, while training still is on the full training set. These properties can be used to improve classification. The logarithmic opinion pool ensemble method is tested on the prediction of protein secondary structure. The focus is on how much improvement the general ensemble method can give rather than on outperforming existing methods, because that typically involves several more steps of refinement. 1 Introduction Empirically it has proven very effective to average over an ensemble of neural networks in order to improve classification performance, rather than to use a single neural network. See [1, 2] for examples in protein secondary structure, [3] for an over-view of applications in molecular biology. Here we present a general ensemble method, and show how the error of an ensemble can be written as the average error of the ensemble members minus a term measuring the disagreement between the members (called the ensemble ambiguity). This proves that the ensemble is always better than the average performance substantiating the empirical observations. The ambiguity is independent of the training targets and can thus be estimated from unlabeled data. Therefore an unbiased estimate of the ensemble error can be found when combining cross-validation and ensemble training. This error estimate can be
2 used for determining when to stop the training of the ensemble. We test this general approach to ensemble training on the secondary structure prediction problem. This problem is well suited for the method, because thousands of proteins without known structure are available for estimation of ambiguity. It is shown that the estimated error works well for stopping training, that the optimal training time is longer than for a single network, and that the method outperforms other ensemble methods tested on the same data. 2 The ensemble decomposition An ensemble consists of M predictors f i, which are combined into the combined predictor F. We consider the case where each predictor outputs a probability vector { fi 1,..., f i N }, where f j i is the estimated probability that input x belongs to class c j. The ensemble has associated M coefficient {α 1,...,α M } obeying i α i = 1,α i 0. It is very common to define the ensemble predictor as F LAP = N i=1 α i f i, which is called the linear average predictor (LAP). The combined predictor for the logarithmic opinion pool (LOP) of the ensemble members is F j = 1 Z exp( M i=1 α i log f j i ), (1) where Z is a normalization factor given by N j=1 exp(m i=1 α i log f j i ). This combination rule is non-linear and asymmetric as opposed to the LAP. Unless otherwise stated, this is the combined predictor we are considering in this work. An example set T consist of input-output pairs, where the output is also a probability vector {t 1,...,t N }. The examples are assumed generated by a target function t. The difference between the target t and the combined predictor F is measured by the Kullback-Leibler (KL) error function E(t,F) = N j=1 ( ) t t j j log F j. (2) The error is zero if F j is equal to t j for all j. For all appearances of this error function the mean over the training set is implicitly taken. If the target probabilities are restricted to one and zero the error function (2) reduces to E(t,F) = log(f k ), where t k is one. This would be the case if the error function is used on a training set consisting of class examples. The error in (2) can be decomposed into two terms with the LOP in (1) E(t,F) = M i=1 α i E(t, f i ) M i=1 α i E(F, f i ) = E(t, f i ) A( f ), (3) where A( f ) is the ambiguity and is the weighted ensemble mean. By using Jensen s inequality it can be shown that ambiguity is always greater than or equal to zero. This implies that the error of the combined predictor always is less than or
3 equal to the mean of the error of the ensemble members. We see that diversity among the ensemble members without increase in the error of each ensemble member will lower the error of the combined predictor. The decomposition in (3) also applies for the LAP ensemble with the quadratic error E(t,F) = (t F) 2 replacing (2) [4]. The LOP decomposition is due to Tom Heskes [5, 6], although it is derived in a slightly different setting. We see that the ambiguity term in (3) is independent of the target probability, which means that the ambiguity can be estimated using unlabeled data, or if the input distribution is known the ambiguity can be estimated without any data. Assuming we have estimates of the generalization error of the ensemble members and an estimate of the ambiguity, the estimated generalization error comes directly from (3). This can be achieved in the cross-validation ensemble, where the training set is divided into M equal sized portions. Ensemble member f i is trained on all the portions except for portion i. The error on portion i is independent of training and can be used to estimate ensemble member f i s generalization error. With the estimated ambiguity, this gives us a method for obtaining an unbiased estimate of the ensemble error and still use all the training data. 3 Tests on the protein secondary structure problem The method is tested on the standard secondary structure problem, in which the task is to predict the three-class secondary structure labels α-helix, β-sheet or coil, which is anything else. There is much work on secondary structure prediction, for overviews see [3]. The currently most used method is probably the one presented in [1]. The LOP ensemble is suitable for the protein problem for several reasons. Firstly the protein problem is a classification problem, and the LOP ensemble method is tailor-made for classification, and secondly there is huge amount of unlabeled data, i.e. proteins of unknown structure, which can be used to estimate the ambiguity. The ensemble members are chosen to be neural networks. The output of a neural network is post-processed by the SOFTMAX function defined as f j = e g j/ k e g k, where g j is the linear output (the weighted sum of the hidden units) of output unit j. This ensures that the ensemble members obey j f j i = 1, f j i 0. The ensemble coefficients are uniform. Learning is done with back-propagation and momentum. A window of 13 amino acids is used, together with a sparse coding of amino acid into a vector of size 20, so each network has 260 inputs and three outputs. Each network has a hidden layer with 50 nodes. All examples in the training set are used once and only once during each epoch. Weights are updated after a certain number of randomly chosen examples have been presented (batch-update). During training the batch size is increased every tenth epoch by a number of examples equal to the square root of the training set size. The learning rate is decreased inverse proportional to the batch size. Training is stopped after 500 epochs, or if the validation error is increased by more than 10 % over the best value. Our data set consists of 650 non-homologous protein sequences with a total of about amino acid [7]. Four-fold cross-validation is used for each test, which means that the training set is divided into four equally sized sets. Training is done
4 on three sets with amino acids in total, while testing is done on the remaining set of amino acids. The test set is rotated for each cross-validation run. The average of the four test runs is calculated. In the cross-validation ensemble the validation error is the estimate of the generalization error calculated from (3) by using a set of unlabeled amino acids to estimate the ambiguity. Seven ensemble members are used, which means that each one was trained on about amino acids and validated on Apart from the cross-validation ensemble, we also test what we will term a simple ensemble, in which all the ensemble members are trained on the same training set of amino acids, and the validation error is calculated from an independent set of labeled data containing amino acids. Note that there is still the four-fold cross-validation as an outer loop for both ensemble methods. For every tenth epoch the ensemble validation error, the ensemble test error, and the average test error of the individual ensemble members were calculated. The graphs in Figure 1 shows for a single test run these values for the cross-validation ensemble. It is clearly seen that an ensemble is better than the average of the individual members, as it should be. The cross-validation ensemble reaches a lower generalization error (represented by the test error) than the simple ensemble. This can be explained by the fact that the difference in training sets makes the ensemble members differ more than if they are trained on the same data, and this increases the am- Kullback-Leibler Error Test Set KL Error of Ensemble Average Test Set KL Error Estimated Generalization KL Error of Ensemble Figure 1: Cross-validation LOP ensemble biguity, which in turn lowers the generalization error. In a practical application, one would select the ensemble at the training epoch with the lowest validation error. The various errors are shown for the ensemble selected in this manner. Using the validation error to select the ensemble training makes training dependent on the validation error, and therefore the estimate of the generalization error becomes biased. We will call the time of lowest validation error the stopping time. In Epochs Table 1: The Kullback-Leibler errors and misclassification rates at stopping time. Combination rule LOP LOP LAP Validation error Cross-validation Simple Cross-validation Train error function KL KL MSE Test error Average of indiv ± ± ± 0.12 Misclassification rate Ambiguity Table 1 the Kullback-Leibler error and misclassification rate at stopping time for the
5 test runs is given. In these runs the validation error fluctuates by about ±3 % around the test error. However, the oscillations of the validation error follows the oscillations of the test set error, as can be seen in Figure 1, so the validation error can still be used to find the lowest test error. The validation error is not always lowest when the test set error is lowest, so a measure of the usability of validation error for stopping is the average difference between the lowest test set error and the test set error at stopping time. For both types of ensembles this difference is as low as or close to 0.05 %. So the estimated generalization error can very accurately be used to find the right stopping time. The cross-validation ensemble reaches a test error that is 1.5 % lower than for the simple ensemble, and a misclassification rate that is 1.6 % lower. The explanation is in a larger ambiguity for the cross-validation ensemble, since the average test error of the ensemble members are comparable. The ambiguity of cross-validation ensemble is 1.5 times the ambiguity for the simple ensemble. As noted in section 2 the error of the combined predictor is always better than the average of the error of the ensemble members. Still, one of the ensemble members can be better than the combined predictor. For the cross-validation ensemble method the test error is , while the average of the error of the ensemble members is The difference (the ambiguity) is A gain of 12.8 %, which is substantial. The standard deviation on the test error of the ensemble members is 0.010, so the ambiguity is more than three times larger. It is very unlikely that any ensemble member has a lower generalization error than the ensemble error. For the simple ensemble method the ambiguity is smaller: or a gain of 8.4 %. The standard deviation among the ensemble members is , so the ambiguity is more than three times the deviation. The lowest average error for the ensemble members do not have to happen when the ensemble error is lowest. Typically the lowest average generalization error of the ensemble members will be reached before the lowest generalization error for the ensemble, so the ensemble can actually gain from overfitting in the individual ensemble members. This effect can be seen in Figure 1. Also the optimal architecture for a simple predictor is often smaller than for ensemble members. A number of single predictors with different size hidden layer has been trained. The number of nodes in the hidden layer are varied from 3 to 400. The training must be done with a separate validation set, since there is no ambiguity for a single predictor. The best result is achieved with 10 hidden nodes giving an average generalization error of , and misclassification rate of , which is respectively 2.0 %, and 2.5 % more than for the cross-validation LOP ensemble. A standard cross-validation ensemble using the MSE error function and LAP combination rule is trained on the same data as the cross-validation LOP ensemble. The validation error is calculated using (3) even though the outputs do not necessary sum to one. The validation error have lost it s meaning as an error, e.g. it can be negative, but it is still valid as an early stopping indicator. This is supported by the fact that the lowest misclassification rate on the test set is only 0.6 % lower than the misclassification rate on the test set at stopping time. The generalization error for the standard ensemble is much higher measured with the KL error function, namely or about 2.5 times more than the cross-validation LOP ensemble. This is not a fair comparison, since the LOP ensemble is trained to minimize the KL error. Another measure is the misclassi-
6 fication rate. For the standard ensemble the misclassification rate is , which is 6.3 % more than the misclassification rate of the cross-validation LOP ensemble. Surprisingly the benefit is not in the combination rule. A test run, where the LOP is replaced with the LAP, while training still uses the KL error, yields a generalization of , which is essentially the same as for the LOP combination rule. The misclassification rate for the LAP is , which 1.0 % more than the LOP. 4 Conclusion It was shown how the generalization error of an ensemble of predictors using a logarithmic opinion pool (LOP) can be estimated using cross-validation on the training set and an estimate of the ambiguity from an independent unlabeled set of data. When testing on prediction of protein secondary structure it was shown that this estimate follows the oscillations of the error measured on an independent test set. The estimated error can be used to stop training when it is at a minimum, and it was shown that the cross-validation LOP ensemble method is superior to single predictors and standard ensemble methods using mean square error function on the protein problem. The benefit is not as much in the combination rule, as in the use of the Kullback-Leibler error function and the target independent ambiguity term. 5 Acknowledgments We would like to thank Claus Andersen and Ole Lund at CBS for sharing their protein data set. This work was supported by the Danish National Research Foundation. References [1] B. Rost and C. Sander. Prediction of protein secondary structure at better than 70 % accuracy. Journal of Molecular Biology, 232(2): , Jul [2] S. K. Riis and A. Krogh. Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. Journal of Computational Biology, 3: , [3] P. Baldi and S. Brunak. Bioinformatics - The Machine Learning Approach. MIT Press, Cambridge MA, [4] Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages The MIT Press, [5] Tom Heskes. Bias/variance decompositions for likelihood-based estimators. Neural Computation, 10(6): , [6] Tom Heskes. Selecting weighting factors in logarithmic opinion pools. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, [7] O. Lund, K. Frimand, J. Gorodkin, H. Bohr, J. Bohr, J. Hansen, and S. Brunak. Protein distance constraints predicted by neural networks and probability density functions. Protein Engineering, 10(11): , 1997.
Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract
Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationProtein Secondary Structure Prediction using Feed-Forward Neural Network
COPYRIGHT 2010 JCIT, ISSN 2078-5828 (PRINT), ISSN 2218-5224 (ONLINE), VOLUME 01, ISSUE 01, MANUSCRIPT CODE: 100713 Protein Secondary Structure Prediction using Feed-Forward Neural Network M. A. Mottalib,
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationBidirectional IOHMMs and Recurrent Neural Networks for Protein Secondary Structure Prediction
1 Bidirectional IOHMMs and Recurrent Neural Networks for Protein Secondary Structure Prediction Pierre Baldi 1, Soren Brunak 2, Paolo Frasconi 3, and Gianluca Pollastri 1 1 Dept. of Information and Computer
More informationMatching Protein/3-Sheet Partners by Feedforward and Recurrent Neural Networks
From: ISMB-00 Proceedings. Copyright 2000, AAAI (www.aaai.org). All rights reserved. Matching Protein/3-Sheet Partners by Feedforward and Recurrent Neural Networks Pierre Baldi* and Gianluca Pollastri
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationPrediction of Beta Sheets in Proteins
Prediction of Beta Sheets in Proteins Anders Krogh The Sanger Centre Hinxton, Carobs CBIO IRQ, UK. Email: krogh@sanger.ac. uk S~ren Kamaric Riis Electronics Institute, Building 349 Technical University
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationepochs epochs
Neural Network Experiments To illustrate practical techniques, I chose to use the glass dataset. This dataset has 214 examples and 6 classes. Here are 4 examples from the original dataset. The last values
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationSTRUCTURAL BIOINFORMATICS I. Fall 2015
STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;
More informationNonlinear Dimensional Reduction in Protein Secondary Structure Prediction
Nonlinear Dimensional Reduction in Protein Secondary Structure Prediction Gisele M. Simas, Silvia S. C. Botelho, Rafael Colares, Ulisses B. Corrêa, Neusa Grando, Guilherme P. Fickel, and Lucas Novelo Centro
More informationStatistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia
More informationSupporting Information
Supporting Information Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction Connor W. Coley a, Regina Barzilay b, William H. Green a, Tommi S. Jaakkola b, Klavs F. Jensen
More informationNeural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:
Neural Networks Nethra Sambamoorthi, Ph.D Jan 2003 CRMportals Inc., Nethra Sambamoorthi, Ph.D Phone: 732-972-8969 Nethra@crmportals.com What? Saying it Again in Different ways Artificial neural network
More informationSUPPLEMENTARY MATERIALS
SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationStable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch
More informationTMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg
title: short title: TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg lecture: Protein Prediction 1 (for Computational Biology) Protein structure TUM summer semester 09.06.2016 1 Last time 2 3 Yet another
More informationHidden Markov Models in computational biology. Ron Elber Computer Science Cornell
Hidden Markov Models in computational biology Ron Elber Computer Science Cornell 1 Or: how to fish homolog sequences from a database Many sequences in database RPOBESEQ Partitioned data base 2 An accessible
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationA Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation
1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,
More informationPresentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy
Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Burkhard Rost and Chris Sander By Kalyan C. Gopavarapu 1 Presentation Outline Major Terminology Problem Method
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationProtein Structure Prediction Using Neural Networks
Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally
More informationTwo-Stage Multi-Class Support Vector Machines to Protein Secondary Structure Prediction. M.N. Nguyen and J.C. Rajapakse
Two-Stage Multi-Class Support Vector Machines to Protein Secondary Structure Prediction M.N. Nguyen and J.C. Rajapakse Pacific Symposium on Biocomputing 10:346-357(2005) TWO-STAGE MULTI-CLASS SUPPORT VECTOR
More informationProtein Structure Prediction Using Multiple Artificial Neural Network Classifier *
Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting
More informationhsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference
CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationProbabilistic Regression Using Basis Function Models
Probabilistic Regression Using Basis Function Models Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract Our goal is to accurately estimate
More informationOptimal Linear Regression on Classifier Outputs
Optimal Linear Regression on Classifier Outputs Yann Guermeur, Florence d Alché-Buc and Patrick Gallinari LIP6, Université Pierre et Marie Curie Tours 46-, Boîte 169 4, Place Jussieu, 75252 Paris cedex
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationEmpirical Risk Minimization, Model Selection, and Model Assessment
Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationLearning features by contrasting natural images with noise
Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationClass Prior Estimation from Positive and Unlabeled Data
IEICE Transactions on Information and Systems, vol.e97-d, no.5, pp.1358 1362, 2014. 1 Class Prior Estimation from Positive and Unlabeled Data Marthinus Christoffel du Plessis Tokyo Institute of Technology,
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationConfidence Estimation Methods for Neural Networks: A Practical Comparison
, 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationAn overview of deep learning methods for genomics
An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationClassification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks
Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University mohits@stanford.edu, mohit@u.nus.edu Abstract In particle physics, Higgs Boson to tau-tau
More informationAnalysis of Fast Input Selection: Application in Time Series Prediction
Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationAnalysis of N-terminal Acetylation data with Kernel-Based Clustering
Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationKeywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm
Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Huffman Encoding
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/3 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun The Courant Institute, New
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationCSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning
CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationRadial Basis Function Neural Networks in Protein Sequence Classification ABSTRACT
(): 195-04 (008) Radial Basis Function Neural Networks in Protein Sequence Classification Zarita Zainuddin and Maragatham Kumar School of Mathematical Sciences, University Science Malaysia, 11800 USM Pulau
More informationUsing a Hopfield Network: A Nuts and Bolts Approach
Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of
More informationThe use of entropy to measure structural diversity
The use of entropy to measure structural diversity L. Masisi 1, V. Nelwamondo 2 and T. Marwala 1 1 School of Electrical & Information Engineering Witwatersrand University, P/Bag 3, Wits, 2050, South Africa
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationSource localization in an ocean waveguide using supervised machine learning
Source localization in an ocean waveguide using supervised machine learning Haiqiang Niu, Emma Reeves, and Peter Gerstoft Scripps Institution of Oceanography, UC San Diego Part I Localization on Noise09
More informationDreem Challenge report (team Bussanati)
Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the
More informationPruning from Adaptive Regularization
Communicated by David MacKay Pruning from Adaptive Regularization Lars Kai Hansen Carl Edward Rasmussen CONNECT, Electronics Institute, Technical University of Denmark, B 349, DK-2800 Lyngby, Denmark Inspired
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationProtein Secondary Structure Prediction
Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms
More informationGeneralization to a zero-data task: an empirical study
Generalization to a zero-data task: an empirical study Université de Montréal 20/03/2007 Introduction Introduction and motivation What is a zero-data task? task for which no training data are available
More informationCS 229 Final report A Study Of Ensemble Methods In Machine Learning
A Study Of Ensemble Methods In Machine Learning Abstract The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann
ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output
More informationoutput: sequence of secondary structure symbols O t B t Input: whole protein sequence
Exploiting the past and the future in protein secondary structure prediction Pierre Baldi Dept. of Information and Computer Science University of California, Irvine Irvine, CA 92697-3425 Sren Brunak Center
More informationNeural Networks Task Sheet 2. Due date: May
Neural Networks 2007 Task Sheet 2 1/6 University of Zurich Prof. Dr. Rolf Pfeifer, pfeifer@ifi.unizh.ch Department of Informatics, AI Lab Matej Hoffmann, hoffmann@ifi.unizh.ch Andreasstrasse 15 Marc Ziegler,
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationBearing fault diagnosis based on EMD-KPCA and ELM
Bearing fault diagnosis based on EMD-KPCA and ELM Zihan Chen, Hang Yuan 2 School of Reliability and Systems Engineering, Beihang University, Beijing 9, China Science and Technology on Reliability & Environmental
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More information8: Hidden Markov Models
8: Hidden Markov Models Machine Learning and Real-world Data Simone Teufel and Ann Copestake Computer Laboratory University of Cambridge Lent 2017 Last session: catchup 1 Research ideas from sentiment
More informationModular Autoencoders for Ensemble Feature Extraction
JLR: Workshop and Conference Proceedings 44 (205) 242-259 NIPS 205 The st International Workshop Feature Extraction: odern Questions and Challenges odular Autoencoders for Ensemble Feature Extraction Henry
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More information