Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos
|
|
- Anis Barker
- 5 years ago
- Views:
Transcription
1 Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos
2 X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6
3 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 3 X 3 X 4 X 4 X 4 X 4 X 5 X 5 X 5 X 5 X 6 X 6 X 6 X 6
4 Features X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 3 X 3 X 4 X 4 X 4 X 4 X 5 X 5 X 5 X 5 X 6 X 6 X 6 X 6
5 Features X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 3 X 3 X 4 X 4 X 4 X 4 X 5 X 5 X 5 X 5 X 6 X 6 X 6 X 6
6 Mixtures X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 3 X 3 X 4 X 4 X 4 X 4 X 5 X 5 X 5 X 5 X 6 X 6 X 6 X 6
7 Parts X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 3 X 3 X 4 X 4 X 4 X 4 X 5 X 5 X 5 X 5 X 6 X 6 X 6 X 6
8 Parts X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 3 X 3 X 4 X 4 X 4 X 4 X 5 X 5 X 5 X 5 X 6 X 6 X 6 X 6
9 Pooling X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 3 X 3 X 4 X 4 X 4 X 4 X 5 X 5 X 5 X 5 X 6 X 6 X 6 X 6
10 !! Z???! w max x + + f (X) 1 f (X) 2 + X Y max max max max Y 1 Y 1 Y 2 Y 2 Motivation SPN Discriminative Experiments Review Training
11 !! Z???! w max x + + f (X) 1 f (X) 2 + X Y max max max max Y 1 Y 1 Y 2 Y 2 Motivation SPN Discriminative Experiments Review Training
12 Graphical Models!? Z???! SPNs perform fast, exact inference on high treewidth models
13 Deep Architectures!?!?! Z????!!? SPNs have full probabilistic semantics and tractable inference over many layers
14 Discriminative Learning f(x) f(x) f(x) f(x) f(x) f(x) f(x) f(x)!? Z(X)???! f(x) f(x) f(x) f(x) f(x) f(x) f(x) f(x) SPNs combine features with fast, exact inference over high treewidth models NIPS 12
15 !! Z???! w max x + + f (X) 1 f (X) 2 + X Y max max max max Y 1 Y 1 Y 2 Y 2 Motivation SPN Discriminative Experiments Review Training
16 X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6
17 A Univariate Distribution Is an SPN. X... Multinomial Gaussian Poisson
18 A Product of SPNs over Disjoint Variables Is an SPN. X Y
19 A Weighted Sum of SPNs over the Same Variables Is an SPN. w 1 w 2 Sums out a mixture variable X Y X Y
20 X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6
21 All Marginals Are Computable in Linear Time X Y
22 All Marginals Are Computable in Linear Time w 1 w 2 X Y X Y
23 All Marginals Are Computable in Linear Time X Y X Y
24 All Marginals Are Computable in Linear Time? X Y X Y
25 All Marginals Are Computable in Linear Time? X Y X Y Evidence Evidence
26 All Marginals Are Computable in Linear Time? X Y X Y Evidence Marginalize Evidence Marginalize
27 All Marginals Are Computable in Linear Time = X Y X Y Evidence Marginalize Evidence Marginalize
28 All MAP States Are Computable in Linear Time? max X Y X Y
29 All MAP States Are Computable in Linear Time? max X Y X Y Evidence Evidence
30 All MAP States Are Computable in Linear Time? max X Y X Y Evidence Mode Evidence Mode
31 All MAP States Are Computable in Linear Time? max X Y X Y Evidence Mode Evidence Mode
32 All MAP States Are Computable in Linear Time = 0.12 max X Y X Y Evidence Mode Evidence Mode
33 All MAP States Are Computable in Linear Time = 0.12 max X Y X Y Evidence Mode Evidence Mode
34 Special Cases of SPNs Junction trees Hierarchical mixture models Non-recursive probabilistic context-free grammars Models with context-specific independence Models with determinism Other high-treewidth models
35 Compactly Representable Probability Distributions Graphical Models Sum-Product Networks Existing Tractable Models
36 Learning SPNs Update Soft Inference (Marginals) Hard Inference (MAP States) Gen. EM Gen. Gradient Disc. Gradient Poon & Domingos, UAI 2011
37 !! Z???! w max x + + f (X) 1 f (X) 2 + X Y max max max max Y 1 Y 1 Y 2 Y 2 Motivation SPN Discriminative Experiments Review Training
38 Discriminative SPNs Y Query H Hidden X Evidence Treat as constants
39 Discriminative SPNs H Hidden Y 1 Y 1 Y 2 Y 2 Y Query X Evidence
40 Discriminative SPNs H Hidden Y 1 Y 1 Y 2 Y 2 Y Query f (X) 1 f (X) 2 f(x) Features (non-negative) X Evidence
41 Discriminative SPNs H Hidden Y 1 Y 1 Y 2 Y 2 Y Query f (X) 1 f (X) 2 f(x) Features (non-negative) X Evidence
42 Discriminative SPNs H Hidden Greater variety than 0.1 generative SPNs Y 1 Y 1 Y 2 Y 2 Y Query f (X) 1 f (X) 2 f(x) Features (non-negative) X Evidence
43 Discriminative Training Tractable! Correct label Best guess
44 SPN Backpropagation f (X) f (X) Y 1 Y 1 Y 2 Y 2
45 SPN Backpropagation f (X) f (X) Y 1 Y 1 Y 2 Y 2
46 SPN Backpropagation f (X) f (X) Y 1 Y 1 Y 2 Y 2
47 SPN Backpropagation X) f (X)
48 SPN Backpropagation
49 SPN Backpropagation For each child j: w n n
50 SPN Backpropagation For each + n Y k2ch(n)\{j} S k 0.9 f (X) 1?
51 SPN Backpropagation For each + n Y k2ch(n)\{j} S k 0.9 f (X)
52 Problem with Backpropagation Gradient diffusion
53 Hard Inference Overcomes Gradient Diffusion Soft Inference (Marginals) Hard Inference (MAP States)
54 Reasons to Use Hard Inference To overcome gradient diffusion When goal is to predict most probable structure For speed or tractability
55 Hard Gradient Correct label Best guess
56 Hard Gradient -
57 Hard Gradient -
58 Hard Gradient max max f 1(X) f (X) - f (X) f (X) 2 max max max max max max max max 1 2 Y Y Y Y Y Y Y Y
59 Hard Gradient max max f (X) 1 - f (X) 1 max max max max Y 1 Y 2 Y 1 Y 2
60 Hard Gradient max f (X) 1 max max max max f (X) 2 Y Y Y Y
61 Hard Gradient max f (X) 1 f (X) 2 # w/ correct label # w/ model guess max max max max Y Y Y Y
62 Learning SPNs: Summary S k w i S i Update Gen. EM Soft Inference (Marginals) w i / k Hard Inference (MAP States) w i = c i Gen. Gradient Disc. Gradient w i = ( w i true label k S i k S exp. label S w i = c k ) w i = w i ( true c i w i test c i )
63 Learning SPNs: Summary S k w i S i Update Gen. EM Soft Inference (Marginals) w i / k Hard Inference (MAP States) w i = c i Gen. Gradient Disc. Gradient w i = ( w i true label k S i k S exp. label S w i = c k ) w i = w i ( true c i w i test c i )
64 Learning SPNs: Summary S k w i S i Update Gen. EM Soft Inference (Marginals) w i / k Hard Inference (MAP States) w i = c i Gen. Gradient Disc. Gradient w i = ( w i true label k S i k S exp. label S w i = c k ) w i = w i ( true c i w i test c i )
65 !! Z???! w max x + + f (X) 1 f (X) 2 + X Y max max max max Y 1 Y 1 Y 2 Y 2 Motivation SPN Discriminative Experiments Review Training
66 Image Classification CIFAR-10 32x32px 50k train 10k test STL-10 96x96px 5k train } 10 folds 8k test 100k unlabeled
67 Feature Extraction Coates et al., AISTATS x6 K-means Triangle encoding K K Max-pooling 32x32 27x27xK GxGxK
68 SPN Architecture Classes + Parts x Mixture + Location + e ~x ij ~f cpt WxWxK GxGxK
69 CIFAR-10 Results 84% SPN 80% Pooling Accuracy 76% 72% SVM Autoenc. 68% RBM 64% Dictionary Size (K) 4x4xK
70 CIFAR-10 Results 20x fewer 84% than SVM 80% SPN Pooling Accuracy 76% 72% SVM SPN models spatial 68% structure among features 64% Dictionary Size (K)
71 CIFAR-10 Results 7x7x400 84% SPN Accuracy 83% 82% 81% Best published Learned Pooling 3-Layer Learned RF 80% 79% SVM 0k 38k 75k 113k 150k #Features
72 STL-10 results 1-layer Vector Quantization 54.9% 1-layer Sparse Coding 59.0% 3-layer Learned Receptive Field Discriminative SPN 60.1% 62.3% 52% 55% 58% 61% 64% without unlabeled data
73 Future Work Max-margin SPNs Learning SPN structure Applying discriminative SPNs to structured prediction Approximate inference using SPNs
74 Poster W15 Summary Discriminative SPNs combine the advantages of Tractable inference Deep architectures Discriminative learning Hard gradient combats diffusion in deep models Discriminative SPNs outperform SVMs and deep models on image classification benchmarks
Sum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationOn the Relationship between Sum-Product Networks and Bayesian Networks
On the Relationship between Sum-Product Networks and Bayesian Networks International Conference on Machine Learning, 2015 Han Zhao Mazen Melibari Pascal Poupart University of Waterloo, Waterloo, ON, Canada
More informationOnline Algorithms for Sum-Product
Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem
More informationStable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung
Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and H.T. Kung Unsupervised Representation Learning Layer 3 Representation Encoding Sparse encoder Layer 2 Representation
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%
More informationParameter learning in CRF s
Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.
More informationHow to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto
1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationDeep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04
Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationLearning to Disentangle Factors of Variation with Manifold Learning
Learning to Disentangle Factors of Variation with Manifold Learning Scott Reed Kihyuk Sohn Yuting Zhang Honglak Lee University of Michigan, Department of Electrical Engineering and Computer Science 08
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationKnowledge Compilation and Machine Learning A New Frontier Adnan Darwiche Computer Science Department UCLA with Arthur Choi and Guy Van den Broeck
Knowledge Compilation and Machine Learning A New Frontier Adnan Darwiche Computer Science Department UCLA with Arthur Choi and Guy Van den Broeck The Problem Logic (L) Knowledge Representation (K) Probability
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationLatent Dirichlet Conditional Naive-Bayes Models
Latent Dirichlet Conditional Naive-Bayes Models Arindam Banerjee Dept of Computer Science & Engineering University of Minnesota, Twin Cities banerjee@cs.umn.edu Hanhuai Shan Dept of Computer Science &
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationConvolution and Pooling as an Infinitely Strong Prior
Convolution and Pooling as an Infinitely Strong Prior Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Convolutional
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationUsing Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada
More informationAsaf Bar Zvi Adi Hayat. Semantic Segmentation
Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationAPPROXIMATION COMPLEXITY OF MAP INFERENCE IN SUM-PRODUCT NETWORKS
APPROXIMATION COMPLEXITY OF MAP INFERENCE IN SUM-PRODUCT NETWORKS Diarmaid Conaty Queen s University Belfast, UK Denis D. Mauá Universidade de São Paulo, Brazil Cassio P. de Campos Queen s University Belfast,
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationSemi-Supervised Learning with Graphs
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI SCS CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1 Semi-supervised Learning classifiers
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationOnline and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks
Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks Abdullah Rashwan Han Zhao Pascal Poupart Computer Science University of Waterloo arashwan@uwaterloo.ca Machine
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationModeling Documents with a Deep Boltzmann Machine
Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationSemi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck HRL/ACTIONS @ KR Oct 28, 2018 Foundation: Logical Circuit Languages Negation Normal Form Circuits Δ
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationarxiv: v1 [cs.lg] 10 Aug 2018
Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning arxiv:1808.03578v1 [cs.lg] 10 Aug 2018 Noah Frazier-Logue Rutgers University Brain Imaging Center Rutgers
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationDeep Learning & Neural Networks Lecture 2
Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationOn Theoretical Properties of Sum-Product Networks
On Theoretical Properties of Sum-Product Networks Robert Peharz Sebastian Tschiatschek Franz Pernkopf Pedro Domingos Signal Processing and Speech Communication Lab, Dept. of Computer Science & Engineering,
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More information