A Deep Interpretation of Classifier Chains
|
|
- Corey Johns
- 5 years ago
- Views:
Transcription
1 A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén Aalto University School of Science, Department of Information and Computer Science and Helsinki Institute for Information Technology Finland Leuven, Belgium. Fri. 31st Oct., 2014 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
2 Multi-label Classification x =, y = [y 1,..., y L ] = [y beach, y sunset, y foliage, y field, y mountain, y urban ] = [1, 0, 1, 0, 0, 0] J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
3 Multi-label Classification Map D input variables (feature attributes) to L output variables (labels). X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D???? Build model h, such that ŷ = [ŷ 1,..., ŷ L ] = h( x). J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
4 Binary Relevance (BR) Train L independent models h = (h 1,..., h L ), one for each label, x y 1 y 2 y 4 For x, predict ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x),..., h L ( x)] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
5 Binary Relevance (BR) Train L independent models h = (h 1,..., h L ), one for each label, For x, predict Table : Binary Relevance: Model ŷ 3 = h 3 ( x) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x),..., h L ( x)] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
6 Binary Relevance (BR) Train L independent models h = (h 1,..., h L ), one for each label, For x, predict Table : Binary Relevance: Model ŷ 3 = h 3 ( x) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x),..., h L ( x)] = h( x) Consensus in the literature: should model relationship between labels! J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
7 Classifier Chains (CC) Predictions are cascaded along a chain as additional features x y 1 y 2 y 4 For x, predict ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x), h 2 ( x, ŷ 1 ),..., h L ( x, ŷ 1,..., ŷ L 1 )] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
8 Classifier Chains (CC) Predictions are cascaded along a chain as additional features For x, predict Table : Classifier Chains: Model ŷ 3 = h 3 ( x, ŷ 1, ŷ 2 ) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D ŷ 1 ŷ 2? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x), h 2 ( x, ŷ 1 ),..., h L ( x, ŷ 1,..., ŷ L 1 )] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
9 Classifier Chains (CC) Predictions are cascaded along a chain as additional features For x, predict Table : Classifier Chains: Model ŷ 3 = h 3 ( x, ŷ 1, ŷ 2 ) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D ŷ 1 ŷ 2? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x), h 2 ( x, ŷ 1 ),..., h L ( x, ŷ 1,..., ŷ L 1 )] = h( x) Typically better performance than BR, similar running time J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
10 Development of Classifier Chains If we change the order of labels, predictive performance is different. Use the default chain Use several random chains in ensemble Use chains based on performance (good accuracy, but expensive) Order the chain according to some heuristic, J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
11 Development of Classifier Chains If we change the order of labels, predictive performance is different. Use the default chain Use several random chains in ensemble Use chains based on performance (good accuracy, but expensive) Order the chain according to some heuristic, e.g., such as label dependence! J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
12 Improvements to Classifier Chains Measure dependence via empirical co-occurrence frequencies, pruned frequency counts, correlation coefficient, mutual information, tested with statistical significance tests, maximum spanning tree algorithm, probabilistic graphical model software, dependence among errors (to get conditional label dependence), test different chain orders with hold-out set / internal cross validation and search the chain space with Monte Carlo search, simulated annealing, beam search, A* search and then create, fully-cascaded classifier chains, population of chains, partially-connected chains, singly-linked chains, trees, directed graphs, undirected graphs, undirected chains, ensemble of trees, ensemble of graphs J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
13 Improvements to Classifier Chains Measure dependence via empirical co-occurrence frequencies, pruned frequency counts, correlation coefficient, mutual information, tested with statistical significance tests, maximum spanning tree algorithm, probabilistic graphical model software, dependence among errors (to get conditional label dependence), test different chain orders with hold-out set / internal cross validation and search the chain space with Monte Carlo search, simulated annealing, beam search, A* search and then create, fully-cascaded classifier chains, population of chains, partially-connected chains, singly-linked chains, trees, directed graphs, undirected graphs, undirected chains, ensemble of trees, ensemble of graphs Performance still ensemble of random chains (or too expensive) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
14 Classifier Chains: Label Dependence We may find that... Y 1 Y 2 E(Y 2 Y 1 ) = E(Y 2 ) But if, given the input, the labels are independent, X Y 1 Y 2 E(Y 2 Y 1, X ) = E(Y 2 X ) then independent classifiers (BR) classifier chains (CC)? J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
15 Example: The xor Problem Toy problem, or and xor X 1 X 2 Y 1 Y 2 Y Clearly, E(Y 3 Y 1, Y 2, X 1, X 2 ) = E(Y 3 X 1, X 2 ), but... Table : Results: 20 examples, h j := logistic regression, default label order. Measure BR CC Hamming acc Exact match J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
16 Label Dependence Chain Structure? The optimal structure / order is likelihood dependent the one which performs best! Just looking at the labels is not enough We could model dependence between errors, rather than labels, [ɛ 1, ɛ 2,..., ɛ L ] = [(y 1 h 1 (x)) 2,..., (y L h L (x)) 2 ] (to take into account the input) but we have to train h pairwise only (or expensive), and does not inform us of directionality! We could use an undirected graph, but then we lose greedy inference. J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
17 Classifier Chains as a Neural Network From the point of view of Y 3 (xor), x x y 1 x y 2 y 1 y 2 y 1 y 2 = = A hidden layer / feature space projection! does not work if we swap Y 3 (xor) with Y 1 (or) x y 1 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
18 Classifier Chains as a Neural Network From the point of view of Y 3 (xor), x x y 1 x x y 2 y 1 y 2 y 1 y 2 y 1 y 2 = = A hidden layer / feature space projection! does not work if we swap Y 3 (xor) with Y 1 (or) x y 1 The third graph is enough for all labels! J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
19 Labels are Transformations of the Input Labels are transformations of the input, which we learn from the training data. For example, ŷ 2 = f 2 (x) = σ(w φ) = σ(w [x 1,..., x D, h(x)] x x f 1 ( ) f 2 ( ) f 1 ( ) f 2 ( ) f 3 ( ) For some fk (x),..., f K (x), any label could be learned separately. J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
20 Labels are Transformations of the Input Labels are transformations of the input, which we learn from the training data. For example, ŷ 2 = f 2 (x) = σ(w φ) = σ(w [x 1,..., x D, h(x)] x f 1 ( ) f 2 ( ) f 3 ( ) y 1 y 2 For some fk (x),..., f K (x), any label could be learned separately. J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
21 A Deep Neural Network What if we replace labels with hidden units? If we 1 (expand x, and invert the order of layers) and 2 make linear units, then we are looking at a deep neural network, y 1 y 2 y 1 y 2 z 1 z 2 z 3 z 4 f 1 ( ) f 2 ( ) f 3 ( ) f 4 ( ) z 1 z 2 z 3 z 4 x 1 x 2 x 3 x 4 x 5 x 1 x 2 x 3 x 4 x 5 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
22 Learning Hidden Units y 1 y 2 z 1 z 2 z 3 z 4 z 1 z 2 z 3 z 4 x 1 x 2 x 3 x 4 x 5 We can learn the hidden layers z [1] and z [2] with Restricted Boltzmann Machines (for example), and then plug in back propagation; or plug in a binary relevance learner ŷ = h(z [2] ) = h(f [2] (f [1] ( x))) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
23 Results Dataset BR CC D CC D BR DBP music (6) (5) (4) (2) (1) scene (6) (4) (3) (1) (7) yeast (5) (2) (1) (6) (3) genbase (3) (1) (1) (5) (5) medical (4) (2) (5) (6) (1) enron (5) (4) (1) (2) (3) avg. rank D h = Deep Structure + h-model on final layer (SVM as base classifier). BR performs poorly (as usual)... but not if we learn higher-level features first! (D BR, DBP) This structure + CC (D CC) performs best of all J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
24 Results Dataset BR CC D CC D BR DBP music (6) (5) (4) (2) (1) scene (6) (4) (3) (1) (7) yeast (5) (2) (1) (6) (3) genbase (3) (1) (1) (5) (5) medical (4) (2) (5) (6) (1) enron (5) (4) (1) (2) (3) avg. rank D h = Deep Structure + h-model on final layer (SVM as base classifier). BR performs poorly (as usual)... but not if we learn higher-level features first! (D BR, DBP) This structure + CC (D CC) performs best of all With the right features, we can use BR even with a linear learner J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
25 Why Mechanism behind classifier chains not well understood Investment in improvements to classifier chains not being rewarded There are disadvantages to classifier chains: complexity with many labels what structure/directionality to use? inflexible (difficult to add/remove labels) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
26 Reflections / Conclusions Classifier chains work as a deep structure other labels are supervised features We can use binary relevance if final-layer inputs are independent of each other (we did this using multiple layers of feature transforms) This has advantages, such as semi-supervised learning, easier to add and drop labels J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
27 A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén Aalto University School of Science, Department of Information and Computer Science and Helsinki Institute for Information Technology Finland Leuven, Belgium. Fri. 31st Oct., 2014 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17
Multi-label learning from batch and streaming data
Multi-label learning from batch and streaming data Jesse Read Télécom ParisTech École Polytechnique Summer School on Mining Big and Complex Data 5 September 26 Ohrid, Macedonia Introduction x = Binary
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationClassifier Chains for Multi-label Classification
Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B.
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationRegret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss
Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1 1 Knowledge
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationProgressive Random k-labelsets for Cost-Sensitive Multi-Label Classification
1 26 Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification Yu-Ping Wu Hsuan-Tien Lin Department of Computer Science and Information Engineering, National Taiwan University, Taiwan
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationUnsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA
Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA with Hiroshi Morioka Dept of Computer Science University of Helsinki, Finland Facebook AI Summit, 13th June 2016 Abstract
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationOptimization of Classifier Chains via Conditional Likelihood Maximization
Optimization of Classifier Chains via Conditional Likelihood Maximization Lu Sun, Mineichi Kudo Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan Abstract
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationCS Machine Learning Qualifying Exam
CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationABC-LogitBoost for Multi-Class Classification
Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12
More informationBinary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach
Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationCS 590D Lecture Notes
CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationReinforcement Learning as Classification Leveraging Modern Classifiers
Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationNotes on Noise Contrastive Estimation (NCE)
Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationMulti-label Active Learning with Auxiliary Learner
Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationAn Ensemble of Bayesian Networks for Multilabel Classification
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence An Ensemble of Bayesian Networks for Multilabel Classification Antonucci Alessandro, Giorgio Corani, Denis Mauá,
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationNotes on Back Propagation in 4 Lines
Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this
More informationActive Learning for Sparse Bayesian Multilabel Classification
Active Learning for Sparse Bayesian Multilabel Classification Deepak Vasisht, MIT & IIT Delhi Andreas Domianou, University of Sheffield Manik Varma, MSR, India Ashish Kapoor, MSR, Redmond Multilabel Classification
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More information