Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Size: px
Start display at page:

Download "Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning"

Transcription

1 Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group ( 14 March 2016 HSE Seminar on Applied Linear Algebra, Moscow, Russia

2 Outline 1 Tensor Train Format 2 ML Application 1: Markov Random Fields 3 ML Application 2: TensorNet

3 What is a tensor? Tensor = multidimensional array: A = [A(i 1,..., i d )], i k {1,..., n k }. Terminology: dimensionality = d (number of indices). size = n 1 n d (number of nodes along each axis). Case d = 1 vector, d = 2 matrix.

4 Curse of dimensionality Number of elements = n d (exponential in d) When n = 2, d = > ( PB of memory). Cannot work with tensors using standard methods.

5 Tensor rank decomposition [Hitchcock, 1927] Recall the rank decomposition for matrices: r A(i 1, i 2 ) = U(i 1, α)v (i 2, α). α=1 This can be generalized to tensors. Tensor rank decomposition (canonical decomposition): R A(i 1,..., i d ) = U 1 (i 1, α)... U d (i d, α). α=1 The minimal possible R is called the (canonical) rank of the tensor A. (+) No curse of dimensionality. (-) Ill-posed problem [de Silva, Lim, 2008]. (-) Rank R should be known in advance for many methods. (-) Computation of R is NP-hard [Hillar, Lim, 2013].

6 Unfolding matrices: definition Every tensor A has d 1 unfolding matrices: A k := [A(i 1... i k ; i k+1... i d )], where A(i 1... i k ; i k+1... i d ) := A(i 1,..., i d ). Here i 1... i k and i k+1... i d are row and column (multi)indices; A k are matrices of size M k N k with M k = k s=1 n s, N k = d s=k+1 n s. This is just a reshape.

7 Unfolding matrices: example Consider A = [A(i, j, k)] given by its elements: Then A(1, 1, 1) = 111, A(2, 1, 1) = 211, A(1, 2, 1) = 121, A(2, 2, 1) = 221, A(1, 1, 2) = 112, A(2, 1, 2) = 212, A(1, 2, 2) = 122, A(2, 2, 2) = 222. [ A 1 = [A(i; jk)] = A 2 = [A(ij; k)] = ],

8 Tensor Train decompositon: motivation Main idea: variable splitting. Consider a rank decomposition of an unfolding matrix: A(i 1 i 2 ; i 3 i 4 i 5 i 6 ) = α 2 U(i 1 i 2 ; α 2 )V (i 3 i 4 i 5 i 6 ; α 2 ). On the left: 6-dimensional tensor; on the right: 3- and 5-dimensional. The dimension has reduced! Proceed recursively.

9 Tensor Train decomposition [Oseledets, 2011] TT-format for a tensor A: A(i 1,..., i d ) = G 1 (α 0, i 1, α 1 )G 2 (α 1, i 2, α 2 )... G d (α d 1, i d, α d ). α 0,...,α d This can be written compactly as a matrix product: A(i 1,..., i d ) = G 1 [i 1 ] G 2 [i 2 ]... G d [i d ] }{{}}{{}}{{} 1 r 1 r 1 r 2 r d 1 1 Terminology: G i : TT-cores (collections of matrices) r i : TT-ranks r = max r i : maximal TT-rank TT-format uses O(dnr 2 ) memory to store O(n d ) elements. Efficient only if the ranks are small.

10 TT-format: example Consider a tensor: Its TT-format: where Check: A(i 1, i 2, i 3 ) := i 1 + i 2 + i 3, i 1 {1, 2, 3}, i 2 {1, 2, 3, 4}, i 3 {1, 2, 3, 4, 5}. A(i 1, i 2, i 3 ) = G 1 [i 1 ]G 2 [i 2 ]G 3 [i 3 ], G 1 [i 1 ] := [ i 1 1 ], G 2 [i 2 ] := A(i 1, i 2, i 3 ) = [ i 1 1 ] [ 1 0 i 2 1 [ 1 0 i 2 1 ] [ 1 i 3 ] = = [ i 1 + i 2 1 ] [ 1 i 3 ] [ 1, G 3 [i 3 ] := i 3 ] ] = i 1 + i 2 + i 3.

11 TT-format: example Consider a tensor: Its TT-format: where A(i 1, i 2, i 3 ) := i 1 + i 2 + i 3, i 1 {1, 2, 3}, i 2 {1, 2, 3, 4}, i 3 {1, 2, 3, 4, 5}. A(i 1, i 2, i 3 ) = G 1 [i 1 ]G 2 [i 2 ]G 3 [i 3 ], G 1 = ([ 1 1 ], [ 2 1 ], [ 3 1 ]) ([ ] [ ] [ G 2 =,, ([ ] [ ] [ ] [ ] G 3 =,,,, The tensor has = 60 elements. TT-format uses 32 elements to describe it. ], [ ]) ]) [ 1 5

12 Finding a TT-representation of a tensor General ways of building a of a tensor: Analytical formulas for the TT-cores. TT-SVD algorithm [Oseledets, 2011]: Exact quasi-optimal method. Suitable only for small tensors (which fit into memory). Interpolation algorithms: AMEn-cross [Dolgov & Savostyanov, 2013], DMRG [Khoromskij & Oseledets, 2010], TT-cross [Oseledets, 2010] Approximate heuristically-based methods. Can be applied for large tensors. No strong guarantees but work well in practice. Operations between other tensors in the TT-format: addition, element-wise product etc.

13 TT-interpolation: problem formulation Input: procedure A(i 1,..., i d ) for computing an arbitrary element of A. Output: of B A: B(i 1,..., i d ) = G 1 [i 1 ]... G d [i d ].

14 Matrix interpolation: Matrix-cross Let A R m n with rank r. It admits a skeleton decomposition [Goreinov et al., 1997]: where A = }{{} C m r  1 }{{} r r  = A(I, J ): non-singular matrix. C = A(:, J ): columns containing Â. R = A(I, :): rows containing Â. }{{} R, r n Q: Which  to choose? A: Any non-singular submatrix if rank A = r exact decomposition. Q: What if rank A r? Different  will give different error. A: Choose a maximal volume submatrix [Goreinov, Tyrtyshnikov, 2001]. It can be found with the maxvol algorithm [Goreinov et al., 2008].

15 TT interpolation with TT-cross: example Hilbert tensor: A(i 1, i 2,..., i d ) := 1 i 1 + i i d. TT-rank Time Iterations Relative accuracy e e e e e e e e e e e-09 [Oseledets & Tyrtyshnikov, 2009]

16 Operations: addition and multiplication by number Let C = A + B: C(i 1,..., i d ) = A(i 1,..., i d ) + B(i 1,..., i d ). TT-cores of C are as [ follows: Ak [i C k [i k ] = k ] 0 0 B k [i k ] C 1 [i 1 ] = [ A 1 [i 1 ] B 1 [i 1 ] ], C d [i d ] = The ranks are doubled. ], k = 2,..., d 1, [ Ad [i d ] B d [i d ] Multiplication by a number: C = A const. Multiply only one core by const. The ranks do not increase. ].

17 Operations: element-wise product Let C = A B: C(i 1,..., i d ) = A(i 1,..., i d ) B(i 1,..., i d ). TT-cores of C can be computed as follows: C k [i k ] = A k [i k ] B k [i k ], where is the Kronecker product operation. rank(c) = rank(a) rank(b).

18 TT-format: efficient operations Operation Output rank Complexity A const r A O(dr A )) A + const r A + 1 A + B r A + r B O(dnrA 2)) O(dn(r A + r B ) 2 ) A B r A r B O(dnrA 2r B 2 ) sum(a) O(dnrA 2)...

19 TT-rounding TT-rounding procedure: Ã = round(a, ε), ε = accuracy: 1 Maximally reduces TT-ranks ensuring that A Ã F ε A F. 2 Uses SVD for compression (like TT-SVD). round( ) Allows one to trade off accuracy vs maximal rank of the TT-representation. Example: round(a + A, ε mach ) = A within machine accuracy ε mach.

20 Example: multivariate integration ( (e i ) d ) 1 I (d) := sin(x 1 + x x d )dx 1 dx 2... dx d = Im. i [0,1] d Use Chebyshev quadrature with n = 11 nodes + TT-cross with r = 2. d I (d) Relative accuracy Time e e e e e e e e e e e e [Oseledets & Tyrtyshnikov, 2009]

21 Outline 1 Tensor Train Format 2 ML Application 1: Markov Random Fields 3 ML Application 2: TensorNet

22 Motivational example: image segmentation Task: assign a label y i to each pixel of an M N image. Let P(y) be the joint probability of labelling y. Two extreme cases: No assumptions about independence: O(K MN ) parameters (K = total number of labels) represents every distribution intractable in general Everything is independent: P(y) = p 1 (y 1 )... p MN (y MN ) O(MNK) parameters represents only a small class of distributions tractable

23 Graphical models Provide a convenient way to define probabilistic models using graphs. Two types: directed graphical models and Markov random fields. We will consider only (discrete) Markov random fields. The edges represent dependencies between the variables. E.g., for image segmentation: yi yi A variable y i is independent of the rest given its immediate neighbours.

24 Markov random fields The model: P(y) = 1 Z Ψ c (y c ), c C Z: normalization constant C: set of all (maximal) cliques in the graph Ψ c : non-negative functions which are called factors Example: y 1 y 2 y 3 y 4 P(y 1, y 2, y 3, y 4 ) = 1 Z Ψ 1(y 1 )Ψ 2 (y 2 )Ψ 3 (y 3 )Ψ 4 (y 4 ) Ψ 12 (y 1, y 2 )Ψ 24 (y 2, y 4 )Ψ 34 (y 3, y 4 )Ψ 13 (y 1, y 3 ) The factors Ψ ij measure compatibility between variables y i and y j.

25 Main problems of interest Probabilistic model: P(y) = 1 Ψ c (y Z c ) = 1 Z exp( E(y)), c C where E is the energy function: E(y) = c C Θ c (y c ), Θ c (y c ) = ln Ψ c (y c ) Maximum a posteriori (MAP) inference: y = argmax P(y) = argmin E(y) y y Estimation of the normalization constant: Z = P(y) y Estimation of the marginal distributions: P(y i ) = y\y i P(y)

26 Tensorial perspective Energy and unnormalized probability are tensors: m E(y 1,..., y n ) = Θ c (y c ), P(y 1,..., y n ) = c=1 m Ψ c (y c ), c=1 tensors where y i {1,..., d}. In this language: MAP-inference minimal element in E Normalization constant sum of all the elements of P Details: Putting MRFs on a Tensor Train [Novikov et al., ICML 2014].

27 Outline 1 Tensor Train Format 2 ML Application 1: Markov Random Fields 3 ML Application 2: TensorNet

28 Classification problem

29 Neural networks hidden 1 Use a composite function: g(x) = f (W 2 f (W 1 x) ) }{{} h R m h k = f ((W 1 x) k ) 1 f (x) = 1 + exp( x) input h 1 x 1 h 2 output g(x) x n... h m

30 TensorNet: motivation Goal: compress fully-connected layers. Why care about memory? State-of-the-art deep neural networks don t fit into the memory of mobile devices. Up to 95% of the parameters are in the fully-connected layers. A shallow network with a huge fully-connected layer can achieve almost the same accuracy as an ensemble of deep CNNs [Ba and Caruana, 2014].

31 Tensor Train layer Let x be the input, y be the output: y = Wx. The matrix W is represented in the TT-format: y(i 1,..., i d ) = Parameters: TT-cores {G k } d k=1. j 1,..., j d G 1 [i 1, j 1 ]... G d [i d, j d ]x(j 1,..., j d ). Details: Tensorizing Neural Networks [Novikov et al., NIPS 2015].

32 Conclusions and corresponding algorithms are a good way to work with huge tensors. Memory and complexity depend on d linearly no curse of dimensionality. TT-format is efficient only if the TT-ranks are small. This is the case in many applications. Code is available: Python: MATLAB: Thanks for attention!

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY Eugene Tyrtyshnikov Institute of Numerical Mathematics Russian Academy of Sciences (joint work with Ivan Oseledets) WHAT ARE TENSORS? Tensors

More information

TENSORS AND COMPUTATIONS

TENSORS AND COMPUTATIONS Institute of Numerical Mathematics of Russian Academy of Sciences eugene.tyrtyshnikov@gmail.com 11 September 2013 REPRESENTATION PROBLEM FOR MULTI-INDEX ARRAYS Going to consider an array a(i 1,..., i d

More information

NUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA

NUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA NUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA Institute of Numerical Mathematics of Russian Academy of Sciences eugene.tyrtyshnikov@gmail.com 2 June 2012 COLLABORATION MOSCOW: I.Oseledets, D.Savostyanov

More information

Tensor networks and deep learning

Tensor networks and deep learning Tensor networks and deep learning I. Oseledets, A. Cichocki Skoltech, Moscow 26 July 2017 What is a tensor Tensor is d-dimensional array: A(i 1,, i d ) Why tensors Many objects in machine learning can

More information

Math 671: Tensor Train decomposition methods

Math 671: Tensor Train decomposition methods Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3

More information

NEW TENSOR DECOMPOSITIONS IN NUMERICAL ANALYSIS AND DATA PROCESSING

NEW TENSOR DECOMPOSITIONS IN NUMERICAL ANALYSIS AND DATA PROCESSING NEW TENSOR DECOMPOSITIONS IN NUMERICAL ANALYSIS AND DATA PROCESSING Institute of Numerical Mathematics of Russian Academy of Sciences eugene.tyrtyshnikov@gmail.com 11 October 2012 COLLABORATION MOSCOW:

More information

Numerical tensor methods and their applications

Numerical tensor methods and their applications Numerical tensor methods and their applications 8 May 2013 All lectures 4 lectures, 2 May, 08:00-10:00: Introduction: ideas, matrix results, history. 7 May, 08:00-10:00: Novel tensor formats (TT, HT, QTT).

More information

Linear Algebra and its Applications

Linear Algebra and its Applications Linear Algebra and its Applications 432 (2010) 70 88 Contents lists available at ScienceDirect Linear Algebra and its Applications journal homepage: www.elsevier.com/locate/laa TT-cross approximation for

More information

Math 671: Tensor Train decomposition methods II

Math 671: Tensor Train decomposition methods II Math 671: Tensor Train decomposition methods II Eduardo Corona 1 1 University of Michigan at Ann Arbor December 13, 2016 Table of Contents 1 What we ve talked about so far: 2 The Tensor Train decomposition

More information

Institute for Computational Mathematics Hong Kong Baptist University

Institute for Computational Mathematics Hong Kong Baptist University Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 09-11 TT-Cross approximation for multidimensional arrays Ivan Oseledets 1, Eugene Tyrtyshnikov 1, Institute of Numerical

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Machine Learning with Quantum-Inspired Tensor Networks

Machine Learning with Quantum-Inspired Tensor Networks Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017 Collaboration with David

More information

Institute for Computational Mathematics Hong Kong Baptist University

Institute for Computational Mathematics Hong Kong Baptist University Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-0 How to find a good submatrix S. A. Goreinov, I. V. Oseledets, D. V. Savostyanov, E. E. Tyrtyshnikov, N. L.

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Tensors and graphical models

Tensors and graphical models Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song Dept. ELEC, VUB Georgia Tech, USA INMA Seminar, May 7, 2013, LLN Outline Tensors Random variables and graphical models Tractable representations

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko

Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko tifica Lecture 1: Introduction to low-rank tensor representation/approximation Alexander Litvinenko http://sri-uq.kaust.edu.sa/ KAUST Figure : KAUST campus, 5 years old, approx. 7000 people (include 1400

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Machine Learning with Tensor Networks

Machine Learning with Tensor Networks Machine Learning with Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 Beijing Jun 2017 Machine learning has physics in its DNA # " # #

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

This work has been submitted to ChesterRep the University of Chester s online research repository.

This work has been submitted to ChesterRep the University of Chester s online research repository. This work has been submitted to ChesterRep the University of Chester s online research repository http://chesterrep.openrepository.com Author(s): Daniel Tock Title: Tensor decomposition and its applications

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Using Graphs to Describe Model Structure. Sargur N. Srihari

Using Graphs to Describe Model Structure. Sargur N. Srihari Using Graphs to Describe Model Structure Sargur N. srihari@cedar.buffalo.edu 1 Topics in Structured PGMs for Deep Learning 0. Overview 1. Challenge of Unstructured Modeling 2. Using graphs to describe

More information

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

arxiv: v1 [math.na] 25 Jul 2017

arxiv: v1 [math.na] 25 Jul 2017 COMPUTING LOW-RANK APPROXIMATIONS OF LARGE-SCALE MATRICES WITH THE TENSOR NETWORK RANDOMIZED SVD KIM BATSELIER, WENJIAN YU, LUCA DANIEL, AND NGAI WONG arxiv:1707.07803v1 [math.na] 25 Jul 2017 Abstract.

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

The multiple-vector tensor-vector product

The multiple-vector tensor-vector product I TD MTVP C KU Leuven August 29, 2013 In collaboration with: N Vanbaelen, K Meerbergen, and R Vandebril Overview I TD MTVP C 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Matrix and Tensor Factorization from a Machine Learning Perspective

Matrix and Tensor Factorization from a Machine Learning Perspective Matrix and Tensor Factorization from a Machine Learning Perspective Christoph Freudenthaler Information Systems and Machine Learning Lab, University of Hildesheim Research Seminar, Vienna University of

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

für Mathematik in den Naturwissenschaften Leipzig

für Mathematik in den Naturwissenschaften Leipzig ŠܹÈÐ Ò ¹ÁÒ Ø ØÙØ für Mathematik in den Naturwissenschaften Leipzig Quantics-TT Approximation of Elliptic Solution Operators in Higher Dimensions (revised version: January 2010) by Boris N. Khoromskij,

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Parameter learning in CRF s

Parameter learning in CRF s Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition

Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition R. Schneider (TUB Matheon) John von Neumann Lecture TU Munich, 2012 Setting - Tensors V ν := R n, H d = H := d ν=1 V ν

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Matrix-Product-States/ Tensor-Trains

Matrix-Product-States/ Tensor-Trains / Tensor-Trains November 22, 2016 / Tensor-Trains 1 Matrices What Can We Do With Matrices? Tensors What Can We Do With Tensors? Diagrammatic Notation 2 Singular-Value-Decomposition 3 Curse of Dimensionality

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information