DepthQualificationExam Presentation
|
|
- Philip Wilkinson
- 6 years ago
- Views:
Transcription
1 DepthQualificationExam Presentation Li Wan, Dept. of Computer Science, Courant Institute, New York University Depth Qualification Exam Presentation p. 1/29
2 OverviewofTalk 1. Literature Survey (a) Approaches to train deep neural network (b) Topic models and its application to computer vision 2. Research Result: How to effectively combine neural network model with topic model. Depth Qualification Exam Presentation p. 2/29
3 Simple NeuralNetwork Non-linear activation function: h i = f(wi T x+b) where function f could be sigmod function σ(x) = 1/(1 + exp( x)) or tanh function (normalized to (0,1) or ( 1,1) scale) neural network [1]. [1] Bishop, Neural Network for Pattern Recognition, 1995 Depth Qualification Exam Presentation p. 3/29
4 DeepNeuralNetworks Deep network is better than shallow However, standard random initialization leads poor training and generalization error[1][3] in deep neural networks except deep CovNets[2]). [1] Bengio et al. Greedy layer-wise training of deep networks, NIPS, 2007 [2] LeCun et al. Back-propagation Applied to Handwritten Zip Code Recognition, Neural Comput [3] Hinton and Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 2006 [4] Glorot and Bengio, Understanding the difficulty of training deep feed forward neural networks, AISTATS, 2010 Depth Qualification Exam Presentation p. 4/29
5 Pre-trainingDeepNeuralNetworks how to Pre-train Deep Neural Networks: greedy layer-wise pre-training[1](x: input data, h hidden layer, and w parameters) Generative model[2][3](restricted Boltzmann Machines): w = argmax p(x,h;w) w Encoder(f)-decoder(g) model[4][5][6][7]: w = argmin w h f(x;w) + x g(h;w) +λ w 1 h [1] Hinton et al. A fast learning algorithm for eep belief nets, Neural Computation, 2006 [2] Hinton et al. Training products of experts by minimizing contrastive divergence, Neural Comput [3] Hinton and Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 2006 [4] Ranzato et al. Unsupervised Learning of Invariant Features Hierarchies with Applications to Object Recognition, CVPR, 2007 [5] Gregor and LeCun, Learning Fast Approximations of Sparse Coding, ICML, 2010 [6] Ranzato et al., Efficient learning of sparse representations with an energy-based model. NIPS, [7] Ranzato et al., Sparse Feature Learning for Deep Belief Networks, NIPS, Depth Qualification Exam Presentation p. 5/29
6 RestrictedBoltzmannMachines Undirected graphical model(bipartite graph U = {x} and V = {h}) with energy function(binary case): E(x,h) = b T x c T h h T Wx Fast inference: p(h x;w) = j p(h j x;w) = j σ(w jx+b) Fast sampling: p(x h;w) = i p(x i h;w) = i σ(wt i h+c). Neural network feed forward operation: f(x; w) = p(h x; w) Initialize W in neural network via maximize p(x;w) = hp(x,h;w) as follows: W = E data [hx T ] E model [hx T ] However, E model [hx T ] is intractable[1] because number of possible h is exponential to its size. Contrast Divergence[2][3] and its extensions[4] proposed to approximate model expectation with a few samples. [1] Long et al. Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate [2] Hinton et al. Training products of experts by minimizing contrastive divergence, Neural Comput [3] Bengio and Delalleau, Justifying and Generalizing Contrastive Divergence, Neural Compt [4] Nair and Hinton, Rectified Linear Units Improve Restricted Boltzmann Machines, ICML, 2010 Depth Qualification Exam Presentation p. 6/29
7 Encoder-DecoderModel Encoding operation should preserve essential information of data x. Verify it by reconstruct x with decoder(g) based on h = f(x; w). Minimize encoding error h f(x; w) and decoding error x g(h; w) with proper penalty on λ w 1 to encourage local filters. t(h) is penalty term of code h to encourage special property such as spareness [3]. L(w,h) = h f(x;w) + x g(h;w) +λ w 1 +αt(h) Neural network feed forward operation is an encoding operation Learning W by repeat the following steps[1][2][3]: (with random initialize w) h 0 f(x;w) h t h t 1 +η L(w,h t 1) h t 1 a few steps with initial condition at h 0 w w +η L(w,h t) w [1] Ranzato et al. Unsupervised Learning of Invariant Features Hierarchies with Applications to Object Recognition, CVPR, 2007 [2] Ranzato et al., Efficient learning of sparse representations with an energy-based model. NIPS, [3] Ranzato et al., Sparse Feature Learning for Deep Belief Networks, NIPS, Depth Qualification Exam Presentation p. 7/29
8 ApplicationsofDeepNeuralNetwork Natural Image patches modeling [1][8] Image classification [2][5] Text Modeling [3] Human Pose Tracking [4] Digit Recognition [6][7] [1] Ranzato and Hinton, Modeling Pixel Means and Covariances Using Factorized Third-Order boltzmann Machines, CVPR 2010 [2] Lee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, ICML 2009 [3] Salakhutdinov and Hinton, Using Deep Blief Nets to Learn Covariance Kernel for Gaussian Processes, NIPS, 2007 [4] Taylor et al. Dynamical Binary Latent Variable Models for 3D Human Pose Tracking, CVPR, 2010 [5] Ranzato et al. Unsupervised Learning of Invariant Features Hierarchies with Applications to Object Recognition, CVPR, 2007 [6] Salakhutdinov and Hinton, Deep Boltzmann Machines, AISTATS, 2009 [7] Salakhutdinov and Hugo, Efficient Learning of Deep Boltzmann Machines, AISTATS, 2010 [8] Osindero and Hinton, Modeling image patches with a directed hierarchy of Markov random fields, NIPS, 2006 Depth Qualification Exam Presentation p. 8/29
9 NeuralNetwork+GaussianRegression Given a data set x with label y, we are interested in the following probabilistic regression model: y = f(x)+ǫ with f(x) N(0,K) and ǫ N(0,σ 2 ) Here K ij = αexp( β(x i x j ) T (x i x j )) is covariance function. Loss function logp(y x) could be defined by integrate f(x) as follows: L = logp(y x) = 1 2 log K +σ2 I 1 2 yt (K +σ 2 I) 1 y +C 1. Gradient L/ x could be written down from definition 2. If x is response of neural network with input v, x/ v could be defined. 3. Back-propagation of the joint model is defined based on L/ x and x/ v [1]. [1] Salakhutdinov and Hinton, Using Deep Blief Nets to Learn Covariance Kernel for Gaussian Processes, NIPS, 2007 Depth Qualification Exam Presentation p. 9/29
10 TopicModels: LSA Latent Semantic Analysis [1]: map document to latent semantic space of reduced dimensionality. Given co-occurrence table X where each row is histogram of words Apply SVD to X: X = UΣV T. Approximate X by a few top singular values in Σ: X = U ΣV T UΣV T = X co-occurrence table in latent space: U Σ because inner product space is: XX T U Σ 2 U T. [1] Deerwester et al. Indexing by latent semantic analysis Depth Qualification Exam Presentation p. 10/29
11 Topic Models: plsa Probabilistic Latent Semantic Analysis [1] Joint distribution p(d,w) = p(d) z p(w z)p(z d) = z p(z)p(d z)p(w z) Relationship with LSA: U ik = p(d i z k ), V jk = P(w j z k ) and Σ kk = p(z k ) Learn with EM by alternate update p(z w,d) and p(w z),p(d z),p(z). [1] Hofmann, Unsupervised learning by probabilistic latent semantic analysis. UAI, Depth Qualification Exam Presentation p. 11/29
12 Topic Models: LDA Each document is a random mixture of corpus-wide topics Each word is draw from one of those topics Depth Qualification Exam Presentation p. 12/29
13 Topic Models: LDA Latent Dirichlet Allocation [1] Fully generative model: extension of plsa Joint distribution: p(w α,β) = p(θ α)( N n=1 Learn with variational EM algorithm ) z n p(z n θ)p(w n z n,β) dθ [1] Blei et al. Latent Dirichlet Allocation. IJMR, 2003 Depth Qualification Exam Presentation p. 13/29
14 ObjectRecognitionin ComputerVision Depth Qualification Exam Presentation p. 14/29
15 ExtractImage Features Extract features from image patches(sift [1],HOG [2],etc.) Learn dictionary from visual features(k-means, sparse coding [3],etc.) Represent images by combining features(histogram, global/local pooling [3][4]) [1] Lowe, Distinctive image features from scale-invariant keypoints, IJCV, 2004 [2] Dalal and Triggs, Histograms of oriented gradients for human detection, CVPR, 2005 [3] Yang et al. Linear spatial pyramid matching using sparse coding for image classification, CVPR, 2009 [4] Boureau et al. Learning mid-level features for recognition, CVPR, 2010 Depth Qualification Exam Presentation p. 15/29
16 ModelImage Features Discriminative model: SVM with linear/hist-intersection/ χ 2 kernel [1] Generative model: Hierarchical Bayesian model could be applied, such as extension of naïve Bayesian model [2], plsa model [3][4], LDA model [5]. [1] Lazebnik et al. Beyond Bags of Features: Spatial Pyramid Mathcing for Recognizing Natural Scene Categories, CVPR, 2006 [1] Dnace et al. Visual categorization with bags of keypoints, ECCV workshop, 2004 [3] Sivic et al. Discovering objects and their location in images, ICCV, 2005 [4] Bosch et al. Scene Classification via plsa, ECCV, 2006 [5] Feifei et al. A Bayesian Hierarchical Model for Learning Natural Scene Categories, CVPR, 2005 Depth Qualification Exam Presentation p. 16/29
17 BayesianModelforObjectRecognition An extension of Bayesian topic model by including location information[1]-[4] symbol description notes w ji K-means index(patch appearance) w ji Multi(ηz) v ji object part(patch location) v ji N(µ k, Λ k ) z ji topic index z ji Multi(πo) ρ j object center location ρ j N(γ, ς) o j image label [1] Sudderth et al. Learning Hierarchical Models of Scenes, Objects, and Parts, ICCV, 2005 [2] Sudderth et al. Describing Visual Scene using Transformed Dirichlet Process, NIPS, 2005 [3] Kivine et al. Learning Multi-scale Representation of Natural Scenes Using Dirichlet Process, ICCV, 2007 [4] Sudderth et al. Describing Visual Scenes using Transformed Objects and Parts, IJCV, 2008 Depth Qualification Exam Presentation p. 17/29
18 BayesianModelforObjectRecognition Each object is a mixture of topics Each appearance and location pair are draw from one of those topics Depth Qualification Exam Presentation p. 18/29
19 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Depth Qualification Exam Presentation p. 19/29
20 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Bayesian Topic Model: transparent to human Depth Qualification Exam Presentation p. 19/29
21 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Bayesian Topic Model: transparent to human Replace regression component of neural network with Bayesian model(topic model) Depth Qualification Exam Presentation p. 19/29
22 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Bayesian Topic Model: transparent to human Replace regression component of neural network with Bayesian model(topic model) Bayesian model with input from the response of neural network Depth Qualification Exam Presentation p. 19/29
23 Whatwe wantto learn Given the input data v with label y, x = f w (v) is output of neural network given input v. The likelihood function is given by: p v (v y) = p x (f w (v) y) det (f w) (v) p x (f w (v) y) defined by generative model (f w ) (v) is the Jacobian matrix Depth Qualification Exam Presentation p. 20/29
24 Whatwe wantto learn Given the input data v with label y, x = f w (v) is output of neural network given input v. The likelihood function is given by: p v (v y) = p x (f w (v) y) det (f w) (v) p x (f w (v) y) defined by generative model (f w ) (v) is the Jacobian matrix Applying Bayesian rule, we have the loss function: p(y v) = p v(v y) ỹ p v(v ỹ) = p x(f w (v) y) ỹ p x(f w (v) ỹ) = p x(y f w (v)) Depth Qualification Exam Presentation p. 20/29
25 Modeloverview y (b) (c) Bayes F 0 Class labels y α β π η S M z u Layer 5 Layer 4 F 1(π) Integration Latent topic z S (=15) M (=45) γ φ K x n i N Layer 3 Integration F 1 (η) Latent word u Gaussian likelihood F 2 (φ={μ,σ}) K (=200) Layer 2 Output x (25 units) Linear layer (W 2) (a) Layer 2 Feature x Linear layer (W 2) (25 units) Layer 1 Hidden units h (600 units) Layer 1 Hidden units h (600 units) Sigmoid layer (W 1) Sigmoid layer (W 1) Input v 128d Input v 128d Depth Qualification Exam Presentation p. 21/29
26 Modeloverview y (b) (c) Bayes F 0 Class labels y α β π η S M z u Layer 5 Layer 4 F 1(π) Integration Latent topic z S (=15) M (=45) γ φ K x n i N Layer 3 Integration F 1 (η) Latent word u Gaussian likelihood F 2 (φ={μ,σ}) K (=200) Layer 2 Output x (25 units) Linear layer (W 2) (a) Layer 2 Feature x Linear layer (W 2) (25 units) Layer 1 Hidden units h (600 units) Layer 1 Hidden units h (600 units) Sigmoid layer (W 1) Sigmoid layer (W 1) Input v 128d Input v 128d 1. We first initialize the parameters {w 0,π 0,η 0,φ 0 } by pre-training of neural network and graphical model Depth Qualification Exam Presentation p. 21/29
27 Modeloverview y (b) (c) Bayes F 0 Class labels y α β π η S M z u Layer 5 Layer 4 F 1(π) Integration Latent topic z S (=15) M (=45) γ φ K x n i N Layer 3 Integration F 1 (η) Latent word u Gaussian likelihood F 2 (φ={μ,σ}) K (=200) Layer 2 Output x (25 units) Linear layer (W 2) (a) Layer 2 Feature x Linear layer (W 2) (25 units) Layer 1 Hidden units h (600 units) Layer 1 Hidden units h (600 units) Sigmoid layer (W 1) Sigmoid layer (W 1) Input v 128d Input v 128d 1. We first initialize the parameters {w 0,π 0,η 0,φ 0 } by pre-training of neural network and graphical model 2. Jointly updated according to the gradient descent: Convert generative model into extra layers of neural network (assume there is a closed form inference in top graphical model). Depth Qualification Exam Presentation p. 21/29
28 GenerativeModel y α π S z β η M u γ φ K x n i N 1. Draw latent topic z j Multi(π yi ) 2. Draw latent word u j Multi(η zi ) 3. Draw feature vector x j Gaussian(φ uj ). Depth Qualification Exam Presentation p. 22/29
29 JointOptimization Overall loss function: L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Depth Qualification Exam Presentation p. 23/29
30 JointOptimization Overall loss function: L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Generative model likelihood function: M K p(f w (v j ) y,π,η,φ) = p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) z j =1 u j =1 Depth Qualification Exam Presentation p. 23/29
31 JointOptimization Overall loss function: L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Generative model likelihood function: M K p(f w (v j ) y,π,η,φ) = p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) z j =1 u j =1 Trick: decompose likelihood function into small piece Depth Qualification Exam Presentation p. 23/29
32 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): M K p(f w (v j ) y,π,η,φ) = p(f w (v j ) u i,φ) p(u j z j,η) p(z j y,π) }{{} z j =1 u j =1 F 2 Depth Qualification Exam Presentation p. 24/29
33 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): Integration Layer on u(f 1 (.,η)): M K p(f w (v j ) y,π,η,φ) = z j =1 u j =1 p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) } {{ } F 1 (.,η) Depth Qualification Exam Presentation p. 25/29
34 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): Integration Layer on u(f 1 (.,η)): Integration Layer on z(f 1 (.,π)): M K p(f w (v j ) y,π,η,φ) = z j =1 u j =1 p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) } {{ } F 1 (.,π) Depth Qualification Exam Presentation p. 26/29
35 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): Integration Layer on u(f 1 (.,η)): Integration Layer on z(f 1 (.,π)): Bayesian Layer(F 0 : p(f w (v j ) y) p(y f w (v j ))): L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Depth Qualification Exam Presentation p. 27/29
36 Toy Data Input v 6 4 Features x (Before Backprop) 8 6 Features x (After Backprop) D data with 5 latent cluster draw from 4 classes shape: class label(cross,dot,square,circle) color: model prediction visualization of input after neural network transformation Depth Qualification Exam Presentation p. 28/29
37 Sceneclassificationresult plsa LDA Neural HTM SVM network ± ± 1.2 HTM Hybrid model Hybrid model SVM pre-trained fully trained 65.5± ± ±0.6 Table 1: Classification rates of different methods on scene classification dataset Depth Qualification Exam Presentation p. 29/29
TUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationEfficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition
Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Kihyuk Sohn Dae Yon Jung Honglak Lee Alfred O. Hero III Dept. of Electrical Engineering and Computer
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationUnsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih
Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationDistinguish between different types of scenes. Matching human perception Understanding the environment
Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationPart 4: Conditional Random Fields
Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationarxiv: v3 [cs.lg] 18 Mar 2013
Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationDeep Learning & Neural Networks Lecture 2
Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation
More informationDynamic Probabilistic Models for Latent Feature Propagation in Social Networks
Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationFisher Vector image representation
Fisher Vector image representation Machine Learning and Category Representation 2014-2015 Jakob Verbeek, January 9, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15 A brief recap on kernel
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationOther Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3
Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Other Topologies The back-propagation procedure is not limited to feed-forward cascades. It can be applied to networks of module with any topology,
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/3 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun The Courant Institute, New
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationDiscriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos
Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationThe XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic
The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationLecture 13 Visual recognition
Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationDeep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04
Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationReading Group on Deep Learning Session 2
Reading Group on Deep Learning Session 2 Stephane Lathuiliere & Pablo Mesejo 10 June 2016 1/39 Chapter Structure Introduction. 5.1. Feed-forward Network Functions. 5.2. Network Training. 5.3. Error Backpropagation.
More informationClassical Predictive Models
Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationCS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016
CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More information