When does a mixture of products contain a product of mixtures?

Size: px
Start display at page:

Download "When does a mixture of products contain a product of mixtures?"

Transcription

1 When does a mixture of products contain a product of mixtures? Jason Morton Penn State May 19, 2014 Algebraic Statistics 2014 IIT Joint work with Guido Montufar Supported by DARPA FA Jason Morton (Penn State) /19/ / 43

2 Outline 1 Motivation and Introduction Definitions Problems 2 Inference functions and regions 3 Geometric Perspectives Perfect reconstructibility Modes Zonosets and hyperplane arrangements Linear threshold functions Implications among properties Jason Morton (Penn State) /19/ / 43

3 Restricted Boltzmann machines: building block of deep learning Restricted Boltzmann machines are the canonical layer in deep learning models. A deep learning model is (approximately) a stack of RBMs. Deep learning models are machines for printing money. Geoff Hinton (Google) Yann LeCun and Rob Fergus (Facebook) Andrew Ng (Baidu) Deep Mind ($500M acquisition by Google to play Space Invaders) Last year, the cost of a top, world-class deep learning expert was about the same as a top NFL quarterback prospect. - Peter Lee, 1/27/2014 head of Microsoft Research Jason Morton (Penn State) /19/ / 43

4 But does it work in theory? Dominant technology for speech recognition, image recognition, expanding range of applications. Deployed at huge scale at Google, Facebook, Microsoft, etc. State-of-the-art results on many tasks. But we can t really prove much about how they work. The theory of deep learning is a wide open field. Everything is up for the taking. Go for it. - Yann LeCun 5/15/2014 on Reddit Machine Learning Jason Morton (Penn State) /19/ / 43

5 Mixture of products M n,k Probability distributions on n binary variables X 1,..., X n Product distribution factors: p(x) = p(x 1 ) p(x n ) A k-mixture of products M n,k, a naïve Bayes model, is a convex combination of k product distributions Closure of the n-dim exponential family pb (x) = 1 Z(B) exp(b x) k pockets each with n different color biased coins Zariski closure M n,k in complex projective space is the kth secant variety of the nth Segre product of P 1 s. dim(m n,k ) = dim(m n,k ) = min{nk + (k 1), 2 n 1} = number of parameters or ambient, except in the degerate case (n, k) = (4, 3), [Catalisano et al. 2011]. Even a full dimensional M n,k has a large complement in the simplex until k 2 n 1 [Montufar 2010]. Jason Morton (Penn State) /19/ / 43

6 Product of mixtures of products: RBM n,m Mixture of experts: Hadamard (pointwise) product of m different M n,2 s. The RBM model with n visible and m hidden binary units is the set RBM n,m of distributions on {0, 1} n which are limits of p(x) = 1 Z(W, B, C) h {0,1} m exp(h Wx + B v + C h), visible units v {0, 1} n, hidden units h {0, 1} m W R m n is a matrix of interaction weights B R n is the visible bias weights C R m is the hidden bias weights Z = v,h exp(h Wx + B x + C h) normalizes Usually has the expected dimension [Cueto M Sturmfels]. Jason Morton (Penn State) /19/ / 43

7 How are RBMs better? Mixture of Products k Product of Mixtures of Products (RBM) C W B M 6,k RBM 6,4 Mixture of products and product of mixtures as graphical models. Dark nodes represent hidden units, light nodes represent visible units. Jason Morton (Penn State) /19/ / 43

8 Problem (1) When does the mixture of product distributions M n,k contain the product of mixtures of product distributions RBM n,m, and vice versa? A result: number of parameters of the smallest mixture of products Mn,k containing RBM n,m grows exponentially in the number of parameters of the RBM for any fixed ratio 0 < m/n <. Reason: the RBM can represent distributions with many more Hamming-local maxima Comes from polyhedral approximations of the sets of probability distributions representable by each model. Jason Morton (Penn State) /19/ / 43

9 Smallest mixture of products representing an RBM m m = n m = 3 4 n dim(rbm n,m)= c n (linear scale) (1) 3 4 n log 2(k) n 1 (2) log 2 (k) = m (3) 3 4 n log 2 (k) m Heat map of log of k(n, m) = min{k N: M n,k RBM n,m }. An RBM of dimension c has hyperparameters n and m satisfying nm + n + m = c (dashed hyperbola). Fixing dimension, the RBMs which are hardest to represent as mixtures of product distributions are those with m/n 1. Jason Morton (Penn State) /19/ / 43

10 Many related problems Problem What sets of length-n binary vectors are (2) perfectly reconstructible by an RBM with m hidden units? (3) the outputs of n linear threshold functions with m input bits? (4) the modes or strong modes (Hamming-local maxima) of distributions represented by an RBM with m hidden units? Probability distributions with many strong modes are written much more compactly as RBMs than as mixtures of products. e.g. those supported on even parity bitstrings Key to understanding how problems relate, and proving superiority of RBMs at modeling complex distributions, is thinking about modes and inference functions. Jason Morton (Penn State) /19/ / 43

11 Modes and hyperplanes Modes are described by linear inequalities of the form p(x) > p(x ) and define polyhedral approximations of probability models. Closely related to binary classification problems and separation of vertex sets of hypercubes by hyperplane arrangements, leading to problems such as Problem (5) What is the smallest arrangement of hyperplanes, if one exists, that slices each edge of a hypercube a given number of times? Goal: provide partial answers to these five problems by explaining connections Jason Morton (Penn State) /19/ / 43

12 1 Motivation and Introduction Definitions Problems 2 Inference functions and regions 3 Geometric Perspectives Perfect reconstructibility Modes Zonosets and hyperplane arrangements Linear threshold functions Implications among properties Jason Morton (Penn State) /19/ / 43

13 Inference functions of RBMs The inference function of a probability model p θ (v, h) with parameter θ Ω explains each value v by the most likely value of h according to up θ : v argmax h p θ (h v) up θ : {0, 1} n {0, 1} m up θ : R n {0, 1} m Each of the m RBM hidden units linearly divides the input space R n according to its preferred state given the input. Jason Morton (Penn State) /19/ / 43

14 Inference regions and distributed representations up θ : {0, 1} n {0, 1} m up θ : R n {0, 1} m Together, the m hidden units partition the input space into inference regions where different joint hidden states are most likely. This defines a distributed encoding or distributed representation [Bengio 2009] of the input vector into the hidden nodes. Such a distributed representation is speculated to be a key to the model s efficacy. Jason Morton (Penn State) /19/ / 43

15 Inference regions of mixture and RBM, for some θ M 2,4 (θ) RBM 2,3 (θ ) (010) 3 2 (110) (011) (01) (11) (01) (11) 1 (111) (100) (001) (00) 4 (10) (00) (10) (101) Both have 7 parameters Both universal approximators of distributions on {0, 1} 2. Define very different inference regions. Jason Morton (Penn State) /19/ / 43

16 RBM combines linear threshold inference functions Definition A linear threshold function (LTF) with m (binary) inputs is a function f : {0, 1} m {, +}; y sgn(( w j y j ) + b); j [m] where w R m is called weight vector and b R bias. Identify /+ and 0/1 vectors via 0 and + 1. Choosing parameters W R m n, B R n, C R m, our model RBM n,m defines the inference function up W,B,C : R n {0, 1} n {0, 1} m ; v argmax h {0,1} m h (Wv+C). The log number of LTFs with m inputs is asymptotically of order m 2, the exact number is only known for m 9. Jason Morton (Penn State) /19/ / 43

17 Inference functions: linear threshold functions Partition given by intersection of an affine space and the normal fan of an m-cube (the orthants of R m ). preimages of the orthants of R m by the affine map ψ : R n R m ; v Wv + C. Number of inference regions rank(w ) ( m ) i=0 = number of i orthants of R m intersected by a generic d-dimensional affine subspace. When rank(w ) < m (e.g. if m > n), the image of the map ψ does not intersect all orthants of R m there are empty inference regions, i.e., states h which are not the explanation of any input vector v. Jason Morton (Penn State) /19/ / 43

18 Inference functions of mixture model The mixture model M n,k has a simpler inference function up λ,b : v argmax i [k] (B i v log(z(b i )) + log(λ i )), (1) with mixture weights λ i, parameters of each mixture component B i R n for i [k], and Z(B i ) = v {0,1} exp(b n i v). Input space R n partitioned into at most k regions of linearity of the function v max{bi v log(z(b i )) + log(λ i ): i [k]}. Partition given by the intersection of an affine space and the normal fan of a (k 1)-simplex. Jason Morton (Penn State) /19/ / 43

19 Comparing inference functions Fixing input space dimension n, the number of inference regions in R n realizable by RBM n,m is of order Θ( ( m min{n,m}) ) Exponential in the number of parameters of the model vs. number of inference regions realizable by mixture M n,k : linear in the number of parameters of the model. So distributed representations can learn different explanations to a number of observations that is exponential in the number of model parameters. Examine more closely the codes (sets of bitstrings) an RBM prefers. Jason Morton (Penn State) /19/ / 43

20 1 Motivation and Introduction Definitions Problems 2 Inference functions and regions 3 Geometric Perspectives Perfect reconstructibility Modes Zonosets and hyperplane arrangements Linear threshold functions Implications among properties Jason Morton (Penn State) /19/ / 43

21 Six 3-parameter properties of sets of binary vectors Let n, m be non-negative integers and C be a subset of {0, 1} n. LTC: The set C is an (n, m)-linear threshold code, i.e., the image of n linear threshold functions with m inputs. HP: There exists an arrangement A of n hyperplanes in R m such that the vertices of the m-dimensional unit cube intersect exactly the C-cells of A. ZP: There is an affine image of the vertices of an m-cube (m-zonoset) in R n which intersects exactly the C-orthants of R n. SM: An RBM with n visible and m hidden nodes can represent a distribution with set of strong modes C. PR: The set C is the set of perfectly reconstructible inputs of an RBM with n visible and m hidden nodes. SP: An RBM with n visible and m hidden nodes can represent a distribution which is strictly positive on C and zero elsewhere. Jason Morton (Penn State) /19/ / 43

22 Perfect reconstructibility Similarly to the up θ inference function, a model p θ (v, h) defines a down θ inference function, down θ outputs the most likely visible state argmax v p θ (v h) given a hidden state h. Definition Given a probability model p θ (v, h) on v X and h Y, a collection of states C X is perfectly reconstructible if there is a choice of the parameter θ for which down θ (up θ (v)) = v for all v C. Jason Morton (Penn State) /19/ / 43

23 Is C reconstructible for some θ? C Jason Morton (Penn State) /19/ / 43

24 Is C reconstructible for some θ? up θ (C) C Jason Morton (Penn State) /19/ / 43

25 Is C reconstructible for some θ? up θ (C) down θ (up θ (C)) Jason Morton (Penn State) /19/ / 43

26 Perfect reconstructibility Ability to reconstruct input vectors used to evaluate the performance of RBMs in practice intuitive, can be tested more cheaply than probability distributions. Taking this seriously leads to autoencoder-like training algorithms, where we minimize reconstruction error. Which subsets of {0, 1} n can be reconstructed? Write the joint distribution on hidden and visible states {p θ (v, h)} v,h as a matrix with rows labeled by h and columns by v. Then C is perfectly reconstructible iff for some θ, we have p θ (v, up θ (v)) is the unique maximal entry in the up θ (v)-row (and in the v-column) for all v C. Jason Morton (Penn State) /19/ / 43

27 Represent distributions with many modes? Let d H (x, y) be the Hamming distance between binary strings x and y: the number of bits that must be flipped to turn x into y. Definition A binary vector x is a mode of a distribution p if p(x) > p(y) for all y X with d H (y, x) = 1, and a strong mode if p(x) > y X :d H (y,x)=1 p(y). The modes of a distribution are the Hamming-locally most likely events in the space of possible events. Modes are closely related to the support sets and boundaries of statistical models, which have been studied especially for hierarchical and graphical models without hidden variables. A complex distribution has many modes. Jason Morton (Penn State) /19/ / 43

28 Polyhedral approximation with (strong) modes Consider the set of distributions which have modes G C or strong modes H C exactly at the bitstrings in a code C (so codewords Hamming distance 2 apart). The closures ( G C and H C resp) are convex mode polytopes inscribed in the probability simplex The sets of modes not realizable by a probability model give a full-dimensional polyhedral approximation of the model s complement. Test cases: binary strings with an even or odd number of ones. These are the maximal subsets of {0, 1} n, with cardinality 2 n 1 and minimum distance two. Jason Morton (Penn State) /19/ / 43

29 Mode polytopes on {0, 1} 2 G + 2 δ (00) δ (11) M 2,1 δ (01) G 2 δ (10) The polytopes at the top and bottom contain the distributions with two modes on even or odd parity bitstrings respectively. Jason Morton (Penn State) /19/ / 43

30 Mode polytopes {0, 1} 3 The set of distributions on {0, 1} 3 with modes on even weight strings is the intersection of the probability simplex and 12 open half-spaces defined by, for each even-weight x, p(x) > p(y) for all y with d H (x, y) = 1. The closure of this set is a convex polytope G + 3 representing 1.79% of the simplex. with 19 vertices The set H + 3 of distributions with strong modes at the four even-parity bitstrings is the intersection of the probability simplex and the 4 half-spaces defined by p(x) > y:d H (x,y)=1 p(y) for x even parity. We use strong mode polytopes because they have fewer facets. Jason Morton (Penn State) /19/ / 43

31 Modes of mixtures of products Theorem The sets C of strong modes of distributions in M n,k are exactly the sets of strings in X 1 X n of minimum Hamming distance at least two and cardinality at most k. Furthermore, if p M n,k has strong modes C, then every c C is the mode of one mixture component of p. For example, although M 3,3 is full dimensional in 2 3 1, its complement contains points arbitrarily close to the uniform distribution! Jason Morton (Penn State) /19/ / 43

32 Modes of RBMs Problem What is the smallest m N for which RBM n,m contains a distribution with l (strong) modes? In particular, what is the smallest m for which the model RBM n,m can represent the parity function? Characterizing the sets of modes realizable by RBMs is a more complex problem for mixtures of product distributions Useful to describe in terms of point configurations called zonosets, hyperplane arrangements, or linear threshold functions. Jason Morton (Penn State) /19/ / 43

33 Zonosets Zonotopes are equivalently images of n-cubes under affine projection, or Minkowski sums of line segments, and encode hyperplane arrangements. Zonosets remember the generating points (interior, multiplicity). Parameters W, B define the projection. Definition Let m 0, n > 0, W i R n for all i [m], and B R n. The multiset Z = { i I W i + B} I [m] is called an m-zonoset. The convex hull of a zonoset is a zonotope. Jason Morton (Penn State) /19/ / 43

34 Theorem: Connecting properties of codes Let C {0, 1} n be a binary code of minimum Hamming distance at least two. If the model RBM n,m contains a distribution with strong modes C or C has cardinality 2 m and is perfectly reconstructible by RBM n,m, then there is an m-zonoset with a point in each C-orthant of R n. If there is an m-zonoset intersecting exactly the C-orthants of R n at points of equal l 1 -norm, then RBMn,m contains a distribution with strong modes C and, furthermore, C is perfectly reconstructible. Jason Morton (Penn State) /19/ / 43

35 Hyperplane arrangements A hyperplane arrangement A in R n is a finite set of (affine) hyperplanes {H i } i [k] in R n. Choose an orientation for each hyperplane, each vector x R n gets a sign vector sgn A (x) {, 0, +} k, indicating whether x lies on the negative side, inside, or on the positive side of each H i. The set of all vectors in R n with the same sign vector is called a cell of A. Jason Morton (Penn State) /19/ / 43

36 Linear threshold codes Recall Definition A linear threshold function (LTF) with m (binary) inputs is a function f : {0, 1} m {, +}; y sgn(( w j y j ) + b); j [m] where w R m is called weight vector and b R bias. When f (x 1,..., x m ) = f (x 1,..., x m ) for all x {0, 1} m, the LTF is self-dual. An LTF with an equal number of positive and negative points separates every input from its opposite and is self-dual. Jason Morton (Penn State) /19/ / 43

37 Linear threshold codes Definition A subset C {0, 1} n = {, +} n is an (n, m)-linear threshold code (LTC) if there exist n linear threshold functions f i : {0, 1} m {0, 1}, i [n] with {(f 1 (y), f 2 (y),..., f n (y)) {0, 1} n : y {0, 1} m } = C. Equivalently, C is an (n, m)-ltc if it is the image of a down inference function of RBM n,m. If all f i can be chosen self-dual, then C is called homogeneous. Jason Morton (Penn State) /19/ / 43

38 LTC Example: n = 3 and m = 2 There are only two ways to linearly separate the vertices of the 2-cube into sets of cardinality two (up to opposites): 1234 and These are the only possible columns of a homogeneous LTC with two inputs (up to opposites). But the even or odd parity code is not a (3, 2)-LTC So, there does not exist a 2-zonoset with vertices in the four even, or odd, orthants of R 3, and that RBM 3,2 does not contain any distributions with four strong modes. But there are 104 ways to linearly separate the vertices of the n = 3-cube. Here is a good one: Jason Morton (Penn State) /19/ / 43

39 LTC Example: n = 4 and m = 3 The 8 vertices of the 3-cube are in the 8 even-parity cells of an arrangement of four hyperplanes corresponding to the (4, 3)-LTC with the LTFs: , , , This arrangement corresponds to a 3-zonoset with points in the 8 even orthants of R 4, realizable as: w = ; Z = 1 b = 1 2 ( ) ; Jason Morton (Penn State) /19/ / 43

40 LTC Example: n = 4 and m = These four slicings of the 3-cube can be repeated to obtain: Jason Morton (Penn State) /19/ / 43

41 Smallest mixture of products representing an RBM m m = n m = 3 4 n dim(rbm n,m)= c n (linear scale) (1) 3 4 n log 2(k) n 1 (2) log 2 (k) = m (3) 3 4 n log 2 (k) m Heat map of log of k(n, m) = min{k N: M n,k RBM n,m }. To improve on our result, just find an RBM n,m where m/n is closer to one. Hint: 5/6 fails. Jason Morton (Penn State) /19/ / 43

42 Six 3-parameter properties of sets of binary vectors Let n, m be non-negative integers and C be a subset of {0, 1} n. LTC: The set C is an (n, m)-linear threshold code, i.e., the image of n linear threshold functions with m inputs. HP: There exists an arrangement A of n hyperplanes in R m such that the vertices of the m-dimensional unit cube intersect exactly the C-cells of A. ZP: There is an affine image of the vertices of an m-cube (m-zonoset) in R n which intersects exactly the C-orthants of R n. SM: An RBM with n visible and m hidden nodes can represent a distribution with set of strong modes C. PR: The set C is the set of perfectly reconstructible inputs of an RBM with n visible and m hidden nodes. SP: An RBM with n visible and m hidden nodes can represent a distribution which is strictly positive on C and zero elsewhere. Jason Morton (Penn State) /19/ / 43

43 SP d H (C) 2 SM d H (C) 2 l 1 PR HP LTC ZP Theorem Fix integers n and m. For any C {0, 1} n, the following hold. 1 The properties LTC, HP, and ZP are equivalent. 2 If C satisfies PR or SM, then it is contained in an LTC set. 3 If the vectors in C are at least Hamming distance 2 apart, then SP implies both SM and PR. 4 If the vectors in C are at least Hamming distance 2 apart and C satisfies an l 1 property, then LTC implies SP. Jason Morton (Penn State) /19/ / 43

Mixture Models and Representational Power of RBM s, DBN s and DBM s

Mixture Models and Representational Power of RBM s, DBN s and DBM s Mixture Models and Representational Power of RBM s, DBN s and DBM s Guido Montufar Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. montufar@mis.mpg.de Abstract

More information

Discrete Restricted Boltzmann Machines

Discrete Restricted Boltzmann Machines Journal of Machine Learning Research 16 (2015) 653-672 Submitted 6/13; Published 4/15 Discrete Restricted Boltzmann Machines Guido Montúfar Max Planck Institute for Mathematics in the Sciences Inselstrasse

More information

Tensor Networks in Algebraic Geometry and Statistics

Tensor Networks in Algebraic Geometry and Statistics Tensor Networks in Algebraic Geometry and Statistics Jason Morton Penn State May 10, 2012 Centro de ciencias de Benasque Pedro Pascual Supported by DARPA N66001-10-1-4040 and FA8650-11-1-7145. Jason Morton

More information

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Discrete Restricted Boltzmann Machines by Guido Montúfar and Jason Morton Preprint no.: 06 204 Discrete Restricted Boltzmann Machines

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Hamming codes and simplex codes ( )

Hamming codes and simplex codes ( ) Chapter 6 Hamming codes and simplex codes (2018-03-17) Synopsis. Hamming codes are essentially the first non-trivial family of codes that we shall meet. We start by proving the Distance Theorem for linear

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Expressive Power and Approximation Errors of Restricted Boltzmann Machines

Expressive Power and Approximation Errors of Restricted Boltzmann Machines Expressive Power and Approximation Errors of Restricted Boltzmann Machines Guido F. Montúfar, Johannes Rauh, and Nihat Ay, Max Planck Institute for Mathematics in the Sciences, Inselstraße 0403 Leipzig,

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter

More information

Network Coding on Directed Acyclic Graphs

Network Coding on Directed Acyclic Graphs Network Coding on Directed Acyclic Graphs John MacLaren Walsh, Ph.D. Multiterminal Information Theory, Spring Quarter, 0 Reference These notes are directly derived from Chapter of R. W. Yeung s Information

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Chapter 3 Linear Block Codes

Chapter 3 Linear Block Codes Wireless Information Transmission System Lab. Chapter 3 Linear Block Codes Institute of Communications Engineering National Sun Yat-sen University Outlines Introduction to linear block codes Syndrome and

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and

More information

Hodge theory for combinatorial geometries

Hodge theory for combinatorial geometries Hodge theory for combinatorial geometries June Huh with Karim Adiprasito and Eric Katz June Huh 1 / 48 Three fundamental ideas: June Huh 2 / 48 Three fundamental ideas: The idea of Bernd Sturmfels that

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

An Introduction to Transversal Matroids

An Introduction to Transversal Matroids An Introduction to Transversal Matroids Joseph E Bonin The George Washington University These slides and an accompanying expository paper (in essence, notes for this talk, and more) are available at http://homegwuedu/

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Expressive Power and Approximation Errors of Restricted Boltzmann Machines

Expressive Power and Approximation Errors of Restricted Boltzmann Machines Expressive Power and Approximation Errors of Restricted Boltzmann Machines Guido F. Montúfar Johannes Rauh Nihat Ay SFI WORKING PAPER: 2011-09-041 SFI Working Papers contain accounts of scientific work

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

Learning with Submodular Functions: A Convex Optimization Perspective

Learning with Submodular Functions: A Convex Optimization Perspective Foundations and Trends R in Machine Learning Vol. 6, No. 2-3 (2013) 145 373 c 2013 F. Bach DOI: 10.1561/2200000039 Learning with Submodular Functions: A Convex Optimization Perspective Francis Bach INRIA

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Linear Algebra Review: Linear Independence. IE418 Integer Programming. Linear Algebra Review: Subspaces. Linear Algebra Review: Affine Independence

Linear Algebra Review: Linear Independence. IE418 Integer Programming. Linear Algebra Review: Subspaces. Linear Algebra Review: Affine Independence Linear Algebra Review: Linear Independence IE418: Integer Programming Department of Industrial and Systems Engineering Lehigh University 21st March 2005 A finite collection of vectors x 1,..., x k R n

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

A Criterion for the Stochasticity of Matrices with Specified Order Relations

A Criterion for the Stochasticity of Matrices with Specified Order Relations Rend. Istit. Mat. Univ. Trieste Vol. XL, 55 64 (2009) A Criterion for the Stochasticity of Matrices with Specified Order Relations Luca Bortolussi and Andrea Sgarro Abstract. We tackle the following problem:

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Deep Belief Networks are Compact Universal Approximators

Deep Belief Networks are Compact Universal Approximators Deep Belief Networks are Compact Universal Approximators Franck Olivier Ndjakou Njeunje Applied Mathematics and Scientific Computation May 16, 2016 1 / 29 Outline 1 Introduction 2 Preliminaries Universal

More information

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Hierarchical Models as Marginals of Hierarchical Models by Guido Montúfar and Johannes Rauh Preprint no.: 27 2016 Hierarchical Models

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Arrangements, matroids and codes

Arrangements, matroids and codes Arrangements, matroids and codes first lecture Ruud Pellikaan joint work with Relinde Jurrius ACAGM summer school Leuven Belgium, 18 July 2011 References 2/43 1. Codes, arrangements and matroids by Relinde

More information

Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems

Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16:38 2001 Linear programming Optimization Problems General optimization problem max{z(x) f j (x) 0,x D} or min{z(x) f j (x) 0,x D}

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Geometry of Phylogenetic Inference

Geometry of Phylogenetic Inference Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic

More information

Qualifier: CS 6375 Machine Learning Spring 2015

Qualifier: CS 6375 Machine Learning Spring 2015 Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet

More information

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x) Outline Recall: For integers Euclidean algorithm for finding gcd s Extended Euclid for finding multiplicative inverses Extended Euclid for computing Sun-Ze Test for primitive roots And for polynomials

More information

Error Detection and Correction: Hamming Code; Reed-Muller Code

Error Detection and Correction: Hamming Code; Reed-Muller Code Error Detection and Correction: Hamming Code; Reed-Muller Code Greg Plaxton Theory in Programming Practice, Spring 2005 Department of Computer Science University of Texas at Austin Hamming Code: Motivation

More information

MATH3302. Coding and Cryptography. Coding Theory

MATH3302. Coding and Cryptography. Coding Theory MATH3302 Coding and Cryptography Coding Theory 2010 Contents 1 Introduction to coding theory 2 1.1 Introduction.......................................... 2 1.2 Basic definitions and assumptions..............................

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between

More information

Lecture 17: Perfect Codes and Gilbert-Varshamov Bound

Lecture 17: Perfect Codes and Gilbert-Varshamov Bound Lecture 17: Perfect Codes and Gilbert-Varshamov Bound Maximality of Hamming code Lemma Let C be a code with distance 3, then: C 2n n + 1 Codes that meet this bound: Perfect codes Hamming code is a perfect

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

An Algebraic Perspective on Deep Learning

An Algebraic Perspective on Deep Learning An Algebraic Perspective on Deep Learning Jason Morton Penn State July 19-20, 2012 IPAM Supported by DARPA FA8650-11-1-7145. Jason Morton (Penn State) Algebraic Deep Learning 7/19/2012 1 / 103 Motivating

More information

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1

More information

Ma/CS 6b Class 25: Error Correcting Codes 2

Ma/CS 6b Class 25: Error Correcting Codes 2 Ma/CS 6b Class 25: Error Correcting Codes 2 By Adam Sheffer Recall: Codes V n the set of binary sequences of length n. For example, V 3 = 000,001,010,011,100,101,110,111. Codes of length n are subsets

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

From the Zonotope Construction to the Minkowski Addition of Convex Polytopes

From the Zonotope Construction to the Minkowski Addition of Convex Polytopes From the Zonotope Construction to the Minkowski Addition of Convex Polytopes Komei Fukuda School of Computer Science, McGill University, Montreal, Canada Abstract A zonotope is the Minkowski addition of

More information

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization Spring 2017 CO 250 Course Notes TABLE OF CONTENTS richardwu.ca CO 250 Course Notes Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4, 2018 Table

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Lecture B04 : Linear codes and singleton bound

Lecture B04 : Linear codes and singleton bound IITM-CS6845: Theory Toolkit February 1, 2012 Lecture B04 : Linear codes and singleton bound Lecturer: Jayalal Sarma Scribe: T Devanathan We start by proving a generalization of Hamming Bound, which we

More information

Decidability of consistency of function and derivative information for a triangle and a convex quadrilateral

Decidability of consistency of function and derivative information for a triangle and a convex quadrilateral Decidability of consistency of function and derivative information for a triangle and a convex quadrilateral Abbas Edalat Department of Computing Imperial College London Abstract Given a triangle in the

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

BL-Functions and Free BL-Algebra

BL-Functions and Free BL-Algebra BL-Functions and Free BL-Algebra Simone Bova bova@unisi.it www.mat.unisi.it/ bova Department of Mathematics and Computer Science University of Siena (Italy) December 9, 008 Ph.D. Thesis Defense Outline

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Why is Deep Learning so effective?

Why is Deep Learning so effective? Ma191b Winter 2017 Geometry of Neuroscience The unreasonable effectiveness of deep learning This lecture is based entirely on the paper: Reference: Henry W. Lin and Max Tegmark, Why does deep and cheap

More information

Auerbach bases and minimal volume sufficient enlargements

Auerbach bases and minimal volume sufficient enlargements Auerbach bases and minimal volume sufficient enlargements M. I. Ostrovskii January, 2009 Abstract. Let B Y denote the unit ball of a normed linear space Y. A symmetric, bounded, closed, convex set A in

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

More information

The Triangle Closure is a Polyhedron

The Triangle Closure is a Polyhedron The Triangle Closure is a Polyhedron Amitabh Basu Robert Hildebrand Matthias Köppe January 8, 23 Abstract Recently, cutting planes derived from maximal lattice-free convex sets have been studied intensively

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 28th, 2013 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Spectral Hashing: Learning to Leverage 80 Million Images

Spectral Hashing: Learning to Leverage 80 Million Images Spectral Hashing: Learning to Leverage 80 Million Images Yair Weiss, Antonio Torralba, Rob Fergus Hebrew University, MIT, NYU Outline Motivation: Brute Force Computer Vision. Semantic Hashing. Spectral

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

TRISTRAM BOGART AND REKHA R. THOMAS

TRISTRAM BOGART AND REKHA R. THOMAS SMALL CHVÁTAL RANK TRISTRAM BOGART AND REKHA R. THOMAS Abstract. We introduce a new measure of complexity of integer hulls of rational polyhedra called the small Chvátal rank (SCR). The SCR of an integer

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

The Triangle Closure is a Polyhedron

The Triangle Closure is a Polyhedron The Triangle Closure is a Polyhedron Amitabh Basu Robert Hildebrand Matthias Köppe November 7, 21 Abstract Recently, cutting planes derived from maximal lattice-free convex sets have been studied intensively

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Partial cubes: structures, characterizations, and constructions

Partial cubes: structures, characterizations, and constructions Partial cubes: structures, characterizations, and constructions Sergei Ovchinnikov San Francisco State University, Mathematics Department, 1600 Holloway Ave., San Francisco, CA 94132 Abstract Partial cubes

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

2. Polynomials. 19 points. 3/3/3/3/3/4 Clearly indicate your correctly formatted answer: this is what is to be graded. No need to justify!

2. Polynomials. 19 points. 3/3/3/3/3/4 Clearly indicate your correctly formatted answer: this is what is to be graded. No need to justify! 1. Short Modular Arithmetic/RSA. 16 points: 3/3/3/3/4 For each question, please answer in the correct format. When an expression is asked for, it may simply be a number, or an expression involving variables

More information

Nonlinear Discrete Optimization

Nonlinear Discrete Optimization Nonlinear Discrete Optimization Technion Israel Institute of Technology http://ie.technion.ac.il/~onn Billerafest 2008 - conference in honor of Lou Billera's 65th birthday (Update on Lecture Series given

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Boolean Autoencoders and Hypercube Clustering Complexity

Boolean Autoencoders and Hypercube Clustering Complexity Designs, Codes and Cryptography manuscript No. (will be inserted by the editor) Boolean Autoencoders and Hypercube Clustering Complexity P. Baldi Received: 1 November 2010 / Revised: 5 June 2012 / Accepted:

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information