A Deep Interpretation of Classifier Chains

Size: px
Start display at page:

Download "A Deep Interpretation of Classifier Chains"

Transcription

1 A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén Aalto University School of Science, Department of Information and Computer Science and Helsinki Institute for Information Technology Finland Leuven, Belgium. Fri. 31st Oct., 2014 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

2 Multi-label Classification x =, y = [y 1,..., y L ] = [y beach, y sunset, y foliage, y field, y mountain, y urban ] = [1, 0, 1, 0, 0, 0] J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

3 Multi-label Classification Map D input variables (feature attributes) to L output variables (labels). X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D???? Build model h, such that ŷ = [ŷ 1,..., ŷ L ] = h( x). J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

4 Binary Relevance (BR) Train L independent models h = (h 1,..., h L ), one for each label, x y 1 y 2 y 4 For x, predict ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x),..., h L ( x)] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

5 Binary Relevance (BR) Train L independent models h = (h 1,..., h L ), one for each label, For x, predict Table : Binary Relevance: Model ŷ 3 = h 3 ( x) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x),..., h L ( x)] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

6 Binary Relevance (BR) Train L independent models h = (h 1,..., h L ), one for each label, For x, predict Table : Binary Relevance: Model ŷ 3 = h 3 ( x) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x),..., h L ( x)] = h( x) Consensus in the literature: should model relationship between labels! J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

7 Classifier Chains (CC) Predictions are cascaded along a chain as additional features x y 1 y 2 y 4 For x, predict ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x), h 2 ( x, ŷ 1 ),..., h L ( x, ŷ 1,..., ŷ L 1 )] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

8 Classifier Chains (CC) Predictions are cascaded along a chain as additional features For x, predict Table : Classifier Chains: Model ŷ 3 = h 3 ( x, ŷ 1, ŷ 2 ) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D ŷ 1 ŷ 2? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x), h 2 ( x, ŷ 1 ),..., h L ( x, ŷ 1,..., ŷ L 1 )] = h( x) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

9 Classifier Chains (CC) Predictions are cascaded along a chain as additional features For x, predict Table : Classifier Chains: Model ŷ 3 = h 3 ( x, ŷ 1, ŷ 2 ) X 1 X 2 X 3... X D Y 1 Y 2 Y 3 Y 4 x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D x 1 x 2 x 3... x D ŷ 1 ŷ 2? ŷ = [ŷ 1,..., ŷ L ] = [h 1 ( x), h 2 ( x, ŷ 1 ),..., h L ( x, ŷ 1,..., ŷ L 1 )] = h( x) Typically better performance than BR, similar running time J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

10 Development of Classifier Chains If we change the order of labels, predictive performance is different. Use the default chain Use several random chains in ensemble Use chains based on performance (good accuracy, but expensive) Order the chain according to some heuristic, J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

11 Development of Classifier Chains If we change the order of labels, predictive performance is different. Use the default chain Use several random chains in ensemble Use chains based on performance (good accuracy, but expensive) Order the chain according to some heuristic, e.g., such as label dependence! J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

12 Improvements to Classifier Chains Measure dependence via empirical co-occurrence frequencies, pruned frequency counts, correlation coefficient, mutual information, tested with statistical significance tests, maximum spanning tree algorithm, probabilistic graphical model software, dependence among errors (to get conditional label dependence), test different chain orders with hold-out set / internal cross validation and search the chain space with Monte Carlo search, simulated annealing, beam search, A* search and then create, fully-cascaded classifier chains, population of chains, partially-connected chains, singly-linked chains, trees, directed graphs, undirected graphs, undirected chains, ensemble of trees, ensemble of graphs J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

13 Improvements to Classifier Chains Measure dependence via empirical co-occurrence frequencies, pruned frequency counts, correlation coefficient, mutual information, tested with statistical significance tests, maximum spanning tree algorithm, probabilistic graphical model software, dependence among errors (to get conditional label dependence), test different chain orders with hold-out set / internal cross validation and search the chain space with Monte Carlo search, simulated annealing, beam search, A* search and then create, fully-cascaded classifier chains, population of chains, partially-connected chains, singly-linked chains, trees, directed graphs, undirected graphs, undirected chains, ensemble of trees, ensemble of graphs Performance still ensemble of random chains (or too expensive) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

14 Classifier Chains: Label Dependence We may find that... Y 1 Y 2 E(Y 2 Y 1 ) = E(Y 2 ) But if, given the input, the labels are independent, X Y 1 Y 2 E(Y 2 Y 1, X ) = E(Y 2 X ) then independent classifiers (BR) classifier chains (CC)? J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

15 Example: The xor Problem Toy problem, or and xor X 1 X 2 Y 1 Y 2 Y Clearly, E(Y 3 Y 1, Y 2, X 1, X 2 ) = E(Y 3 X 1, X 2 ), but... Table : Results: 20 examples, h j := logistic regression, default label order. Measure BR CC Hamming acc Exact match J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

16 Label Dependence Chain Structure? The optimal structure / order is likelihood dependent the one which performs best! Just looking at the labels is not enough We could model dependence between errors, rather than labels, [ɛ 1, ɛ 2,..., ɛ L ] = [(y 1 h 1 (x)) 2,..., (y L h L (x)) 2 ] (to take into account the input) but we have to train h pairwise only (or expensive), and does not inform us of directionality! We could use an undirected graph, but then we lose greedy inference. J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

17 Classifier Chains as a Neural Network From the point of view of Y 3 (xor), x x y 1 x y 2 y 1 y 2 y 1 y 2 = = A hidden layer / feature space projection! does not work if we swap Y 3 (xor) with Y 1 (or) x y 1 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

18 Classifier Chains as a Neural Network From the point of view of Y 3 (xor), x x y 1 x x y 2 y 1 y 2 y 1 y 2 y 1 y 2 = = A hidden layer / feature space projection! does not work if we swap Y 3 (xor) with Y 1 (or) x y 1 The third graph is enough for all labels! J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

19 Labels are Transformations of the Input Labels are transformations of the input, which we learn from the training data. For example, ŷ 2 = f 2 (x) = σ(w φ) = σ(w [x 1,..., x D, h(x)] x x f 1 ( ) f 2 ( ) f 1 ( ) f 2 ( ) f 3 ( ) For some fk (x),..., f K (x), any label could be learned separately. J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

20 Labels are Transformations of the Input Labels are transformations of the input, which we learn from the training data. For example, ŷ 2 = f 2 (x) = σ(w φ) = σ(w [x 1,..., x D, h(x)] x f 1 ( ) f 2 ( ) f 3 ( ) y 1 y 2 For some fk (x),..., f K (x), any label could be learned separately. J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

21 A Deep Neural Network What if we replace labels with hidden units? If we 1 (expand x, and invert the order of layers) and 2 make linear units, then we are looking at a deep neural network, y 1 y 2 y 1 y 2 z 1 z 2 z 3 z 4 f 1 ( ) f 2 ( ) f 3 ( ) f 4 ( ) z 1 z 2 z 3 z 4 x 1 x 2 x 3 x 4 x 5 x 1 x 2 x 3 x 4 x 5 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

22 Learning Hidden Units y 1 y 2 z 1 z 2 z 3 z 4 z 1 z 2 z 3 z 4 x 1 x 2 x 3 x 4 x 5 We can learn the hidden layers z [1] and z [2] with Restricted Boltzmann Machines (for example), and then plug in back propagation; or plug in a binary relevance learner ŷ = h(z [2] ) = h(f [2] (f [1] ( x))) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

23 Results Dataset BR CC D CC D BR DBP music (6) (5) (4) (2) (1) scene (6) (4) (3) (1) (7) yeast (5) (2) (1) (6) (3) genbase (3) (1) (1) (5) (5) medical (4) (2) (5) (6) (1) enron (5) (4) (1) (2) (3) avg. rank D h = Deep Structure + h-model on final layer (SVM as base classifier). BR performs poorly (as usual)... but not if we learn higher-level features first! (D BR, DBP) This structure + CC (D CC) performs best of all J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

24 Results Dataset BR CC D CC D BR DBP music (6) (5) (4) (2) (1) scene (6) (4) (3) (1) (7) yeast (5) (2) (1) (6) (3) genbase (3) (1) (1) (5) (5) medical (4) (2) (5) (6) (1) enron (5) (4) (1) (2) (3) avg. rank D h = Deep Structure + h-model on final layer (SVM as base classifier). BR performs poorly (as usual)... but not if we learn higher-level features first! (D BR, DBP) This structure + CC (D CC) performs best of all With the right features, we can use BR even with a linear learner J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

25 Why Mechanism behind classifier chains not well understood Investment in improvements to classifier chains not being rewarded There are disadvantages to classifier chains: complexity with many labels what structure/directionality to use? inflexible (difficult to add/remove labels) J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

26 Reflections / Conclusions Classifier chains work as a deep structure other labels are supervised features We can use binary relevance if final-layer inputs are independent of each other (we did this using multiple layers of feature transforms) This has advantages, such as semi-supervised learning, easier to add and drop labels J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

27 A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén Aalto University School of Science, Department of Information and Computer Science and Helsinki Institute for Information Technology Finland Leuven, Belgium. Fri. 31st Oct., 2014 J. Read and J. Holmén (Aalto University) A Deep Interpretation Classifier Chains Oct 31, / 17

Multi-label learning from batch and streaming data

Multi-label learning from batch and streaming data Multi-label learning from batch and streaming data Jesse Read Télécom ParisTech École Polytechnique Summer School on Mining Big and Complex Data 5 September 26 Ohrid, Macedonia Introduction x = Binary

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Classifier Chains for Multi-label Classification

Classifier Chains for Multi-label Classification Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B.

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1 1 Knowledge

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification

Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification 1 26 Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification Yu-Ping Wu Hsuan-Tien Lin Department of Computer Science and Information Engineering, National Taiwan University, Taiwan

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA with Hiroshi Morioka Dept of Computer Science University of Helsinki, Finland Facebook AI Summit, 13th June 2016 Abstract

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Optimization of Classifier Chains via Conditional Likelihood Maximization

Optimization of Classifier Chains via Conditional Likelihood Maximization Optimization of Classifier Chains via Conditional Likelihood Maximization Lu Sun, Mineichi Kudo Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan Abstract

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

ABC-LogitBoost for Multi-Class Classification

ABC-LogitBoost for Multi-Class Classification Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12

More information

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

CS 590D Lecture Notes

CS 590D Lecture Notes CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Reinforcement Learning as Classification Leveraging Modern Classifiers

Reinforcement Learning as Classification Leveraging Modern Classifiers Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Notes on Noise Contrastive Estimation (NCE)

Notes on Noise Contrastive Estimation (NCE) Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks

More information

Multi-label Active Learning with Auxiliary Learner

Multi-label Active Learning with Auxiliary Learner Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

An Ensemble of Bayesian Networks for Multilabel Classification

An Ensemble of Bayesian Networks for Multilabel Classification Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence An Ensemble of Bayesian Networks for Multilabel Classification Antonucci Alessandro, Giorgio Corani, Denis Mauá,

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

Notes on Back Propagation in 4 Lines

Notes on Back Propagation in 4 Lines Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this

More information

Active Learning for Sparse Bayesian Multilabel Classification

Active Learning for Sparse Bayesian Multilabel Classification Active Learning for Sparse Bayesian Multilabel Classification Deepak Vasisht, MIT & IIT Delhi Andreas Domianou, University of Sheffield Manik Varma, MSR, India Ashish Kapoor, MSR, Redmond Multilabel Classification

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information