Spectral Hashing: Learning to Leverage 80 Million Images

Size: px
Start display at page:

Download "Spectral Hashing: Learning to Leverage 80 Million Images"

Transcription

1 Spectral Hashing: Learning to Leverage 80 Million Images Yair Weiss, Antonio Torralba, Rob Fergus Hebrew University, MIT, NYU

2 Outline Motivation: Brute Force Computer Vision. Semantic Hashing. Spectral Hashing.

3 Motivation Brute Force Computer Vision using millions of labeled images (Torralba et al, Hays and Efros, Snaveley et al)

4 Tiny Images dataset Query search engines with 80K nouns in English. One thousand images each 80 Million Images

5 Twin Jet

6 Mohammed

7 Killer Whale

8 Brute Force Recognition?

9 Brute Force Recognition?

10 Why this won t work I Grandmother cell reborn. I Similarity between images. I Noisy Labels. I Efficient search.

11 Some Inspiration

12 Why this won t work I Grandmother cell reborn. I Similarity between images. I Noisy Labels. I Efficient search.

13 Semantic Hashing Address Space Query image Semantic Hash Function Images in database Seman cally similar images Query address Short (32bit) codes. Hamming distance semantic distance. (Salakhutdinov and Hinton, 2007)

14 Constructing Codes so that Hamming distance semantic distance. Deep Neural Network (Salakhutdinov and Hinton 07) Random Projections LSH (Andoni and Indyk 06) Boosting (Shakhnarovich et al. 03)

15 Deep Neural Network 2000 Top Layer Binary Codes 32 W3 500 W2 500 W W W W RBM RBM RBM Gaussian Noise W 1+ε6 500 W 2+ε5 500 W 3+ε4 32 Code Layer T W 3+ε3 500 T W 2+ε2 500 T W 1+ε The Deep Generative Model Recursive Pretraining Fine tuning Figure 2: Left panel: The deep generative model. Middle panel: Pretraining consists of learning a stack of RBM s in which the feature activations of one RBM are treated as data by the next RBM. Right panel: After pretraining, the RBM s are unrolled to create a multi-layer autoencoder that is fine-tuned by backpropagation. First, there are limitations on the types of structure that can be represented efficiently by a single layer of hidden variables. We will show that a network with multiple hidden layers and with millions of parameters can discover latent representations that work much better for information retrieval. Second, all of these text retrieval algorithms are based on computing a similarity measure between a query document and other documents in the collection. The similarity is computed either directly in the word space or in a lowdimensional latent space. If this is done naively, the retrieval time complexity of these models is O(NV ), where N is the size of the document corpus and V is the size of vocabulary or dimensionality (Salakhutdinov and Hinton 07) of hidden variables at a time [8]. After learning is complete, the mapping from a word-count vector to the states of the top-level variables is fast, requiring only a matrixmultiplication followed by a componentwise non-linearity for each hidden layer. After the greedy, layer-by-layer training, the generative model is not significantly better thana model withonly one hidden layer. To take fulladvantage ofthe multiple hiddenlayers, the layer-by-layer learning must be treated as a pretraining stage that finds a good region of the parameter space. Starting in this region, a gradient search can then fine-tune the model parameters to produce a much better model [10].

16 LSH C LSH neighbors for hamming distance < 2 Claim: If y k arerbm random(two linear hidden thresholds, layers) then Hamming distance monotonic with Euclidean distance asymptotically. (Andoni and Indyk 06)

17 Our Approach Optimization problem for best hashing code. NP Hard spectral relaxation Eigenvectors Eigenfunctions simple alg. State-of-the-art results.

18 Optimization Input: {x i } semantic feature space, W ij = exp( x i x j 2 /σ 2 ) Output: y i { 1, 1} k Good Code: (1) Small Hamming Distance between neighbors (2) Bits fire 50% and independent.

19 Graph Partitioning minimize : ij W ij y i y j 2 subject to : y i { 1, 1} k y i = 0 1 N i i y i y T i = I

20 Graph Partitioning minimize : ij W ij y i y j 2 subject to : y i { 1, 1} k y i = 0 1 N i i y i y T i Observation: NP Hard even for one bit. = I

21 Graph Partitioning minimize : ij W ij y i y j 2 subject to : y i { 1, 1} k y i = 0 1 N i i y i y T i = I Relaxation Smallest eigenvectors of graph Laplacian.

22 Out of Sample Extension Nystrom? Too expensive Calculating Nystrom as expensive as exhaustive nearest neighbor.

23 Out of Sample Extension Using Eigenfunctions Assume x IID samples from p(x). Calculate limit of eigenvectors as number of points. (Coifman et al. 05,Belkin Niyogi 07,Bengio et al. 04,Nadler et al. 08).

24 Graph Partitioning minimize : ij W ij y i y j 2 subject to : y i { 1, 1} k y i = 0 1 N i i y i y T i = I Relaxation Smallest eigenvectors of graph Laplacian.

25 Out of Sample Extensions with Eigenfunctions minimize : y(x 1 ) y(x 2 ) 2 W (x 1 x 2 ) p(x 1 )p(x 2 )dx 1 x 2 subject to : y(x) { 1, 1} k y(x)p(x)dx = 0 y(x)y(x) T p(x)dx = I Relaxation Smallest eigenfunctions of Laplace-Beltrami.

26 Analytical Eigenfunctions for ND uniform If each dimension is uniform [a i, b i ] then eigenfunctions are product of 1D sinusoids. Φ k (x) = sin( π 2 + kπ b a x) λ k = 1 e ɛ2 2 kπ b a 2

27 Pairwise Independence too weak 3 Thresholded eigenfunctions can be deterministic functions Current solution: use only single-dimension eigenfunctions.

28 Experiments - Synthetic Training samples stumps boosting SSC LSH RBM (two hidden layers) Proportion good neighbors for hamming distance < Proportion good neighbors for hamming distance < RBM+ spectral hashing Spectral hashing Boosting + spectral hashing RBM stumps boosting SSC stumps boosting SSC RBM LSH LSH 0 number of bits a) number 2D uniform of bits distribution LSH Boosting SSC LSH Boosting SSC LSH Boosting SSC RBM (two hidden layers) Spectral hashing RBM (two hidden layers) Spectral hashing RBM (two hidden layers) Spectral hashing a) 3 bits b) 7 bits c) 15 bits

29 Experiments - Real Data Approximate p(x) with multidimensional rectangle. Works well despite bad assumption. Semantic Distance Euclidean Distance in GIST descriptor.

30 LabelMe dataset Proportion good neighbors for hamming distance < Spectral hashing RBM Boosting SSC LSH number of bits Input Gist neighbors Spectral hashing 10 bits Boosting 10 bits

31 80 Million Image dataset Gist neighbors Spectral hashing: 32 bits 64 bits Retrieval time: microseconds.

32 Limitations Three professors, no students. p(x) uniform assumption. Higher order dependencies between bits. Rounding problem.

33 Why this won t work I Grandmother cell reborn. I Similarity between images. I Noisy Labels. I Efficient search.

34 Conclusions Brute force computer vision using hundreds of millions of images. Hashing allows retrieval in microseconds. Spectral hashing - simple learning that outperforms the state-of-the-art. Code Available: Google spectral hashing

Spectral Hashing. Antonio Torralba 1 1 CSAIL, MIT, 32 Vassar St., Cambridge, MA Abstract

Spectral Hashing. Antonio Torralba 1 1 CSAIL, MIT, 32 Vassar St., Cambridge, MA Abstract Spectral Hashing Yair Weiss,3 3 School of Computer Science, Hebrew University, 9904, Jerusalem, Israel yweiss@cs.huji.ac.il Antonio Torralba CSAIL, MIT, 32 Vassar St., Cambridge, MA 0239 torralba@csail.mit.edu

More information

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group NON-LINEAR DIMENSIONALITY REDUCTION USING NEURAL NETORKS Ruslan Salakhutdinov Joint work with Geoff Hinton University of Toronto, Machine Learning Group Overview Document Retrieval Present layer-by-layer

More information

Linear Spectral Hashing

Linear Spectral Hashing Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to

More information

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA LOCALITY PRESERVING HASHING Yi-Hsuan Tsai Ming-Hsuan Yang Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA ABSTRACT The spectral hashing algorithm relaxes

More information

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High

More information

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Optimal Data-Dependent Hashing for Approximate Near Neighbors Optimal Data-Dependent Hashing for Approximate Near Neighbors Alexandr Andoni 1 Ilya Razenshteyn 2 1 Simons Institute 2 MIT, CSAIL April 20, 2015 1 / 30 Nearest Neighbor Search (NNS) Let P be an n-point

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2014 1 / 35 Outline 1 Universality of Neural Networks

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Minimal Loss Hashing for Compact Binary Codes

Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David J. Fleet Department of Computer Science, University of oronto, Canada norouzi@cs.toronto.edu fleet@cs.toronto.edu Abstract We propose a method for learning similaritypreserving hash

More information

WaldHash: sequential similarity-preserving hashing

WaldHash: sequential similarity-preserving hashing WaldHash: sequential similarity-preserving hashing Alexander M. Bronstein 1, Michael M. Bronstein 2, Leonidas J. Guibas 3, and Maks Ovsjanikov 4 1 Department of Electrical Engineering, Tel-Aviv University

More information

Self-Tuning Semantic Image Segmentation

Self-Tuning Semantic Image Segmentation Self-Tuning Semantic Image Segmentation Sergey Milyaev 1,2, Olga Barinova 2 1 Voronezh State University sergey.milyaev@gmail.com 2 Lomonosov Moscow State University obarinova@graphics.cs.msu.su Abstract.

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

The Power of Asymmetry in Binary Hashing

The Power of Asymmetry in Binary Hashing The Power of Asymmetry in Binary Hashing Behnam Neyshabur Yury Makarychev Toyota Technological Institute at Chicago Russ Salakhutdinov University of Toronto Nati Srebro Technion/TTIC Search by Image Image

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Correlation Autoencoder Hashing for Supervised Cross-Modal Search Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Global vs. Multiscale Approaches

Global vs. Multiscale Approaches Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes Ruslan Salakhutdinov and Geoffrey Hinton Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada

More information

An efficient way to learn deep generative models

An efficient way to learn deep generative models An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

arxiv: v1 [cs.cv] 8 Oct 2018

arxiv: v1 [cs.cv] 8 Oct 2018 Deep LDA Hashing Di Hu School of Computer Science Northwestern Polytechnical University Feiping Nie School of Computer Science Northwestern Polytechnical University arxiv:1810.03402v1 [cs.cv] 8 Oct 2018

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Deep Boltzmann Machines

Deep Boltzmann Machines Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing.

Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. COMS 4995-3: Advanced Algorithms Feb 15, 2017 Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. Instructor: Alex Andoni Scribes: Weston Jackson, Edo Roth 1 Introduction Today s lecture is

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 23 1 / 27 Overview

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

Learning to Hash with Partial Tags: Exploring Correlation Between Tags and Hashing Bits for Large Scale Image Retrieval

Learning to Hash with Partial Tags: Exploring Correlation Between Tags and Hashing Bits for Large Scale Image Retrieval Learning to Hash with Partial Tags: Exploring Correlation Between Tags and Hashing Bits for Large Scale Image Retrieval Qifan Wang 1, Luo Si 1, and Dan Zhang 2 1 Department of Computer Science, Purdue

More information

Supplemental Material for Discrete Graph Hashing

Supplemental Material for Discrete Graph Hashing Supplemental Material for Discrete Graph Hashing Wei Liu Cun Mu Sanjiv Kumar Shih-Fu Chang IM T. J. Watson Research Center Columbia University Google Research weiliu@us.ibm.com cm52@columbia.edu sfchang@ee.columbia.edu

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Stochastic Generative Hashing

Stochastic Generative Hashing Stochastic Generative Hashing B. Dai 1, R. Guo 2, S. Kumar 2, N. He 3 and L. Song 1 1 Georgia Institute of Technology, 2 Google Research, NYC, 3 University of Illinois at Urbana-Champaign Discussion by

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Deep Belief Networks are Compact Universal Approximators

Deep Belief Networks are Compact Universal Approximators Deep Belief Networks are Compact Universal Approximators Franck Olivier Ndjakou Njeunje Applied Mathematics and Scientific Computation May 16, 2016 1 / 29 Outline 1 Introduction 2 Preliminaries Universal

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Isotropic Hashing. Abstract

Isotropic Hashing. Abstract Isotropic Hashing Weihao Kong, Wu-Jun Li Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong University, China {kongweihao,liwujun}@cs.sjtu.edu.cn

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

Hamming Compatible Quantization for Hashing

Hamming Compatible Quantization for Hashing Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Hamming Compatible Quantization for Hashing Zhe Wang, Ling-Yu Duan, Jie Lin, Xiaofang Wang, Tiejun

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Multiscale Manifold Learning

Multiscale Manifold Learning Multiscale Manifold Learning Chang Wang IBM T J Watson Research Lab Kitchawan Rd Yorktown Heights, New York 598 wangchan@usibmcom Sridhar Mahadevan Computer Science Department University of Massachusetts

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Generative Models for Sentences

Generative Models for Sentences Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Supervised Hashing via Uncorrelated Component Analysis

Supervised Hashing via Uncorrelated Component Analysis Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Supervised Hashing via Uncorrelated Component Analysis SungRyull Sohn CG Research Team Electronics and Telecommunications

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto 1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

When does a mixture of products contain a product of mixtures?

When does a mixture of products contain a product of mixtures? When does a mixture of products contain a product of mixtures? Jason Morton Penn State May 19, 2014 Algebraic Statistics 2014 IIT Joint work with Guido Montufar Supported by DARPA FA8650-11-1-7145 Jason

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

How to learn from very few examples?

How to learn from very few examples? How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A

More information

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Robust Classification using Boltzmann machines by Vasileios Vasilakakis Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Algorithms for Nearest Neighbors

Algorithms for Nearest Neighbors Algorithms for Nearest Neighbors Background and Two Challenges Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura McGill University, July 2007 1 / 29 Outline

More information