DepthQualificationExam Presentation

Size: px
Start display at page:

Download "DepthQualificationExam Presentation"

Transcription

1 DepthQualificationExam Presentation Li Wan, Dept. of Computer Science, Courant Institute, New York University Depth Qualification Exam Presentation p. 1/29

2 OverviewofTalk 1. Literature Survey (a) Approaches to train deep neural network (b) Topic models and its application to computer vision 2. Research Result: How to effectively combine neural network model with topic model. Depth Qualification Exam Presentation p. 2/29

3 Simple NeuralNetwork Non-linear activation function: h i = f(wi T x+b) where function f could be sigmod function σ(x) = 1/(1 + exp( x)) or tanh function (normalized to (0,1) or ( 1,1) scale) neural network [1]. [1] Bishop, Neural Network for Pattern Recognition, 1995 Depth Qualification Exam Presentation p. 3/29

4 DeepNeuralNetworks Deep network is better than shallow However, standard random initialization leads poor training and generalization error[1][3] in deep neural networks except deep CovNets[2]). [1] Bengio et al. Greedy layer-wise training of deep networks, NIPS, 2007 [2] LeCun et al. Back-propagation Applied to Handwritten Zip Code Recognition, Neural Comput [3] Hinton and Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 2006 [4] Glorot and Bengio, Understanding the difficulty of training deep feed forward neural networks, AISTATS, 2010 Depth Qualification Exam Presentation p. 4/29

5 Pre-trainingDeepNeuralNetworks how to Pre-train Deep Neural Networks: greedy layer-wise pre-training[1](x: input data, h hidden layer, and w parameters) Generative model[2][3](restricted Boltzmann Machines): w = argmax p(x,h;w) w Encoder(f)-decoder(g) model[4][5][6][7]: w = argmin w h f(x;w) + x g(h;w) +λ w 1 h [1] Hinton et al. A fast learning algorithm for eep belief nets, Neural Computation, 2006 [2] Hinton et al. Training products of experts by minimizing contrastive divergence, Neural Comput [3] Hinton and Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 2006 [4] Ranzato et al. Unsupervised Learning of Invariant Features Hierarchies with Applications to Object Recognition, CVPR, 2007 [5] Gregor and LeCun, Learning Fast Approximations of Sparse Coding, ICML, 2010 [6] Ranzato et al., Efficient learning of sparse representations with an energy-based model. NIPS, [7] Ranzato et al., Sparse Feature Learning for Deep Belief Networks, NIPS, Depth Qualification Exam Presentation p. 5/29

6 RestrictedBoltzmannMachines Undirected graphical model(bipartite graph U = {x} and V = {h}) with energy function(binary case): E(x,h) = b T x c T h h T Wx Fast inference: p(h x;w) = j p(h j x;w) = j σ(w jx+b) Fast sampling: p(x h;w) = i p(x i h;w) = i σ(wt i h+c). Neural network feed forward operation: f(x; w) = p(h x; w) Initialize W in neural network via maximize p(x;w) = hp(x,h;w) as follows: W = E data [hx T ] E model [hx T ] However, E model [hx T ] is intractable[1] because number of possible h is exponential to its size. Contrast Divergence[2][3] and its extensions[4] proposed to approximate model expectation with a few samples. [1] Long et al. Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate [2] Hinton et al. Training products of experts by minimizing contrastive divergence, Neural Comput [3] Bengio and Delalleau, Justifying and Generalizing Contrastive Divergence, Neural Compt [4] Nair and Hinton, Rectified Linear Units Improve Restricted Boltzmann Machines, ICML, 2010 Depth Qualification Exam Presentation p. 6/29

7 Encoder-DecoderModel Encoding operation should preserve essential information of data x. Verify it by reconstruct x with decoder(g) based on h = f(x; w). Minimize encoding error h f(x; w) and decoding error x g(h; w) with proper penalty on λ w 1 to encourage local filters. t(h) is penalty term of code h to encourage special property such as spareness [3]. L(w,h) = h f(x;w) + x g(h;w) +λ w 1 +αt(h) Neural network feed forward operation is an encoding operation Learning W by repeat the following steps[1][2][3]: (with random initialize w) h 0 f(x;w) h t h t 1 +η L(w,h t 1) h t 1 a few steps with initial condition at h 0 w w +η L(w,h t) w [1] Ranzato et al. Unsupervised Learning of Invariant Features Hierarchies with Applications to Object Recognition, CVPR, 2007 [2] Ranzato et al., Efficient learning of sparse representations with an energy-based model. NIPS, [3] Ranzato et al., Sparse Feature Learning for Deep Belief Networks, NIPS, Depth Qualification Exam Presentation p. 7/29

8 ApplicationsofDeepNeuralNetwork Natural Image patches modeling [1][8] Image classification [2][5] Text Modeling [3] Human Pose Tracking [4] Digit Recognition [6][7] [1] Ranzato and Hinton, Modeling Pixel Means and Covariances Using Factorized Third-Order boltzmann Machines, CVPR 2010 [2] Lee et al. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, ICML 2009 [3] Salakhutdinov and Hinton, Using Deep Blief Nets to Learn Covariance Kernel for Gaussian Processes, NIPS, 2007 [4] Taylor et al. Dynamical Binary Latent Variable Models for 3D Human Pose Tracking, CVPR, 2010 [5] Ranzato et al. Unsupervised Learning of Invariant Features Hierarchies with Applications to Object Recognition, CVPR, 2007 [6] Salakhutdinov and Hinton, Deep Boltzmann Machines, AISTATS, 2009 [7] Salakhutdinov and Hugo, Efficient Learning of Deep Boltzmann Machines, AISTATS, 2010 [8] Osindero and Hinton, Modeling image patches with a directed hierarchy of Markov random fields, NIPS, 2006 Depth Qualification Exam Presentation p. 8/29

9 NeuralNetwork+GaussianRegression Given a data set x with label y, we are interested in the following probabilistic regression model: y = f(x)+ǫ with f(x) N(0,K) and ǫ N(0,σ 2 ) Here K ij = αexp( β(x i x j ) T (x i x j )) is covariance function. Loss function logp(y x) could be defined by integrate f(x) as follows: L = logp(y x) = 1 2 log K +σ2 I 1 2 yt (K +σ 2 I) 1 y +C 1. Gradient L/ x could be written down from definition 2. If x is response of neural network with input v, x/ v could be defined. 3. Back-propagation of the joint model is defined based on L/ x and x/ v [1]. [1] Salakhutdinov and Hinton, Using Deep Blief Nets to Learn Covariance Kernel for Gaussian Processes, NIPS, 2007 Depth Qualification Exam Presentation p. 9/29

10 TopicModels: LSA Latent Semantic Analysis [1]: map document to latent semantic space of reduced dimensionality. Given co-occurrence table X where each row is histogram of words Apply SVD to X: X = UΣV T. Approximate X by a few top singular values in Σ: X = U ΣV T UΣV T = X co-occurrence table in latent space: U Σ because inner product space is: XX T U Σ 2 U T. [1] Deerwester et al. Indexing by latent semantic analysis Depth Qualification Exam Presentation p. 10/29

11 Topic Models: plsa Probabilistic Latent Semantic Analysis [1] Joint distribution p(d,w) = p(d) z p(w z)p(z d) = z p(z)p(d z)p(w z) Relationship with LSA: U ik = p(d i z k ), V jk = P(w j z k ) and Σ kk = p(z k ) Learn with EM by alternate update p(z w,d) and p(w z),p(d z),p(z). [1] Hofmann, Unsupervised learning by probabilistic latent semantic analysis. UAI, Depth Qualification Exam Presentation p. 11/29

12 Topic Models: LDA Each document is a random mixture of corpus-wide topics Each word is draw from one of those topics Depth Qualification Exam Presentation p. 12/29

13 Topic Models: LDA Latent Dirichlet Allocation [1] Fully generative model: extension of plsa Joint distribution: p(w α,β) = p(θ α)( N n=1 Learn with variational EM algorithm ) z n p(z n θ)p(w n z n,β) dθ [1] Blei et al. Latent Dirichlet Allocation. IJMR, 2003 Depth Qualification Exam Presentation p. 13/29

14 ObjectRecognitionin ComputerVision Depth Qualification Exam Presentation p. 14/29

15 ExtractImage Features Extract features from image patches(sift [1],HOG [2],etc.) Learn dictionary from visual features(k-means, sparse coding [3],etc.) Represent images by combining features(histogram, global/local pooling [3][4]) [1] Lowe, Distinctive image features from scale-invariant keypoints, IJCV, 2004 [2] Dalal and Triggs, Histograms of oriented gradients for human detection, CVPR, 2005 [3] Yang et al. Linear spatial pyramid matching using sparse coding for image classification, CVPR, 2009 [4] Boureau et al. Learning mid-level features for recognition, CVPR, 2010 Depth Qualification Exam Presentation p. 15/29

16 ModelImage Features Discriminative model: SVM with linear/hist-intersection/ χ 2 kernel [1] Generative model: Hierarchical Bayesian model could be applied, such as extension of naïve Bayesian model [2], plsa model [3][4], LDA model [5]. [1] Lazebnik et al. Beyond Bags of Features: Spatial Pyramid Mathcing for Recognizing Natural Scene Categories, CVPR, 2006 [1] Dnace et al. Visual categorization with bags of keypoints, ECCV workshop, 2004 [3] Sivic et al. Discovering objects and their location in images, ICCV, 2005 [4] Bosch et al. Scene Classification via plsa, ECCV, 2006 [5] Feifei et al. A Bayesian Hierarchical Model for Learning Natural Scene Categories, CVPR, 2005 Depth Qualification Exam Presentation p. 16/29

17 BayesianModelforObjectRecognition An extension of Bayesian topic model by including location information[1]-[4] symbol description notes w ji K-means index(patch appearance) w ji Multi(ηz) v ji object part(patch location) v ji N(µ k, Λ k ) z ji topic index z ji Multi(πo) ρ j object center location ρ j N(γ, ς) o j image label [1] Sudderth et al. Learning Hierarchical Models of Scenes, Objects, and Parts, ICCV, 2005 [2] Sudderth et al. Describing Visual Scene using Transformed Dirichlet Process, NIPS, 2005 [3] Kivine et al. Learning Multi-scale Representation of Natural Scenes Using Dirichlet Process, ICCV, 2007 [4] Sudderth et al. Describing Visual Scenes using Transformed Objects and Parts, IJCV, 2008 Depth Qualification Exam Presentation p. 17/29

18 BayesianModelforObjectRecognition Each object is a mixture of topics Each appearance and location pair are draw from one of those topics Depth Qualification Exam Presentation p. 18/29

19 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Depth Qualification Exam Presentation p. 19/29

20 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Bayesian Topic Model: transparent to human Depth Qualification Exam Presentation p. 19/29

21 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Bayesian Topic Model: transparent to human Replace regression component of neural network with Bayesian model(topic model) Depth Qualification Exam Presentation p. 19/29

22 My ResearchResult Combine neural network model with topic model Neural network: nonlinear transformation Bayesian Topic Model: transparent to human Replace regression component of neural network with Bayesian model(topic model) Bayesian model with input from the response of neural network Depth Qualification Exam Presentation p. 19/29

23 Whatwe wantto learn Given the input data v with label y, x = f w (v) is output of neural network given input v. The likelihood function is given by: p v (v y) = p x (f w (v) y) det (f w) (v) p x (f w (v) y) defined by generative model (f w ) (v) is the Jacobian matrix Depth Qualification Exam Presentation p. 20/29

24 Whatwe wantto learn Given the input data v with label y, x = f w (v) is output of neural network given input v. The likelihood function is given by: p v (v y) = p x (f w (v) y) det (f w) (v) p x (f w (v) y) defined by generative model (f w ) (v) is the Jacobian matrix Applying Bayesian rule, we have the loss function: p(y v) = p v(v y) ỹ p v(v ỹ) = p x(f w (v) y) ỹ p x(f w (v) ỹ) = p x(y f w (v)) Depth Qualification Exam Presentation p. 20/29

25 Modeloverview y (b) (c) Bayes F 0 Class labels y α β π η S M z u Layer 5 Layer 4 F 1(π) Integration Latent topic z S (=15) M (=45) γ φ K x n i N Layer 3 Integration F 1 (η) Latent word u Gaussian likelihood F 2 (φ={μ,σ}) K (=200) Layer 2 Output x (25 units) Linear layer (W 2) (a) Layer 2 Feature x Linear layer (W 2) (25 units) Layer 1 Hidden units h (600 units) Layer 1 Hidden units h (600 units) Sigmoid layer (W 1) Sigmoid layer (W 1) Input v 128d Input v 128d Depth Qualification Exam Presentation p. 21/29

26 Modeloverview y (b) (c) Bayes F 0 Class labels y α β π η S M z u Layer 5 Layer 4 F 1(π) Integration Latent topic z S (=15) M (=45) γ φ K x n i N Layer 3 Integration F 1 (η) Latent word u Gaussian likelihood F 2 (φ={μ,σ}) K (=200) Layer 2 Output x (25 units) Linear layer (W 2) (a) Layer 2 Feature x Linear layer (W 2) (25 units) Layer 1 Hidden units h (600 units) Layer 1 Hidden units h (600 units) Sigmoid layer (W 1) Sigmoid layer (W 1) Input v 128d Input v 128d 1. We first initialize the parameters {w 0,π 0,η 0,φ 0 } by pre-training of neural network and graphical model Depth Qualification Exam Presentation p. 21/29

27 Modeloverview y (b) (c) Bayes F 0 Class labels y α β π η S M z u Layer 5 Layer 4 F 1(π) Integration Latent topic z S (=15) M (=45) γ φ K x n i N Layer 3 Integration F 1 (η) Latent word u Gaussian likelihood F 2 (φ={μ,σ}) K (=200) Layer 2 Output x (25 units) Linear layer (W 2) (a) Layer 2 Feature x Linear layer (W 2) (25 units) Layer 1 Hidden units h (600 units) Layer 1 Hidden units h (600 units) Sigmoid layer (W 1) Sigmoid layer (W 1) Input v 128d Input v 128d 1. We first initialize the parameters {w 0,π 0,η 0,φ 0 } by pre-training of neural network and graphical model 2. Jointly updated according to the gradient descent: Convert generative model into extra layers of neural network (assume there is a closed form inference in top graphical model). Depth Qualification Exam Presentation p. 21/29

28 GenerativeModel y α π S z β η M u γ φ K x n i N 1. Draw latent topic z j Multi(π yi ) 2. Draw latent word u j Multi(η zi ) 3. Draw feature vector x j Gaussian(φ uj ). Depth Qualification Exam Presentation p. 22/29

29 JointOptimization Overall loss function: L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Depth Qualification Exam Presentation p. 23/29

30 JointOptimization Overall loss function: L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Generative model likelihood function: M K p(f w (v j ) y,π,η,φ) = p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) z j =1 u j =1 Depth Qualification Exam Presentation p. 23/29

31 JointOptimization Overall loss function: L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Generative model likelihood function: M K p(f w (v j ) y,π,η,φ) = p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) z j =1 u j =1 Trick: decompose likelihood function into small piece Depth Qualification Exam Presentation p. 23/29

32 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): M K p(f w (v j ) y,π,η,φ) = p(f w (v j ) u i,φ) p(u j z j,η) p(z j y,π) }{{} z j =1 u j =1 F 2 Depth Qualification Exam Presentation p. 24/29

33 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): Integration Layer on u(f 1 (.,η)): M K p(f w (v j ) y,π,η,φ) = z j =1 u j =1 p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) } {{ } F 1 (.,η) Depth Qualification Exam Presentation p. 25/29

34 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): Integration Layer on u(f 1 (.,η)): Integration Layer on z(f 1 (.,π)): M K p(f w (v j ) y,π,η,φ) = z j =1 u j =1 p(f w (v j ) u i,φ)p(u j z j,η) p(z j y,π) } {{ } F 1 (.,π) Depth Qualification Exam Presentation p. 26/29

35 Unifiedmodel Gaussian Likelihood Layer(F 2 : f w (v j ) p(f w (v j ) u j,φ)): Integration Layer on u(f 1 (.,η)): Integration Layer on z(f 1 (.,π)): Bayesian Layer(F 0 : p(f w (v j ) y) p(y f w (v j ))): L = j logp(f w (v j ) y,π,η,φ)+log S p(f w (v j ) y = i,π,η,φ) i=1 j Depth Qualification Exam Presentation p. 27/29

36 Toy Data Input v 6 4 Features x (Before Backprop) 8 6 Features x (After Backprop) D data with 5 latent cluster draw from 4 classes shape: class label(cross,dot,square,circle) color: model prediction visualization of input after neural network transformation Depth Qualification Exam Presentation p. 28/29

37 Sceneclassificationresult plsa LDA Neural HTM SVM network ± ± 1.2 HTM Hybrid model Hybrid model SVM pre-trained fully trained 65.5± ± ±0.6 Table 1: Classification rates of different methods on scene classification dataset Depth Qualification Exam Presentation p. 29/29

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition Kihyuk Sohn Dae Yon Jung Honglak Lee Alfred O. Hero III Dept. of Electrical Engineering and Computer

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Distinguish between different types of scenes. Matching human perception Understanding the environment

Distinguish between different types of scenes. Matching human perception Understanding the environment Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Part 4: Conditional Random Fields

Part 4: Conditional Random Fields Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Fisher Vector image representation

Fisher Vector image representation Fisher Vector image representation Machine Learning and Category Representation 2014-2015 Jakob Verbeek, January 9, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15 A brief recap on kernel

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Language Information Processing, Advanced. Topic Models

Language Information Processing, Advanced. Topic Models Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Other Topologies The back-propagation procedure is not limited to feed-forward cascades. It can be applied to networks of module with any topology,

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/3 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun The Courant Institute, New

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04 Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures Lecture 04 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Self-Taught Learning 1. Learn

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Reading Group on Deep Learning Session 2

Reading Group on Deep Learning Session 2 Reading Group on Deep Learning Session 2 Stephane Lathuiliere & Pablo Mesejo 10 June 2016 1/39 Chapter Structure Introduction. 5.1. Feed-forward Network Functions. 5.2. Network Training. 5.3. Error Backpropagation.

More information

Classical Predictive Models

Classical Predictive Models Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information