Machine Learning with Quantum-Inspired Tensor Networks

Similar documents
Machine Learning with Tensor Networks

Supervised Learning with Tensor Networks

Tensor network vs Machine learning. Song Cheng ( 程嵩 ) IOP, CAS

Introduction to tensor network state -- concept and algorithm. Z. Y. Xie ( 谢志远 ) ITP, Beijing

The density matrix renormalization group and tensor network methods

Quantum simulation with string-bond states: Joining PEPS and Monte Carlo

Quantum Convolutional Neural Networks

CS 6375 Machine Learning

Tensor network methods in condensed matter physics. ISSP, University of Tokyo, Tsuyoshi Okubo

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

An introduction to tensornetwork

4 Matrix product states

3 Symmetry Protected Topological Phase

Loop optimization for tensor network renormalization

Introduction to Tensor Networks: PEPS, Fermions, and More

Fermionic tensor networks

arxiv: v2 [cs.lg] 10 Apr 2017

Classification with Perceptrons. Reading:

IPAM/UCLA, Sat 24 th Jan Numerical Approaches to Quantum Many-Body Systems. QS2009 tutorials. lecture: Tensor Networks.

Journal Club: Brief Introduction to Tensor Network

T ensor N et works. I ztok Pizorn Frank Verstraete. University of Vienna M ichigan Quantum Summer School

Fermionic tensor networks

Neural Network Representation of Tensor Network and Chiral States

The Perceptron. Volker Tresp Summer 2016

Simulation of Quantum Many-Body Systems

Understanding Deep Learning via Physics:

Lecture 3: Tensor Product Ansatz

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

Matrix-Product states: Properties and Extensions

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Advanced Computation for Complex Materials

Renormalization of Tensor Network States

Machine Learning Techniques

Excursion: MPS & DMRG

Quantum many-body systems and tensor networks: simulation methods and applications

Tensor network simulation of QED on infinite lattices: learning from (1 + 1)d, and prospects for (2 + 1)d

Convolution and Pooling as an Infinitely Strong Prior

Matrix Product States

CSC321 Lecture 20: Autoencoders

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Simon Trebst. Quantitative Modeling of Complex Systems

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

The Perceptron. Volker Tresp Summer 2018

Tensor networks and deep learning

Fermionic topological quantum states as tensor networks

Typical quantum states at finite temperature

Time Evolving Block Decimation Algorithm

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Expressive Efficiency and Inductive Bias of Convolutional Networks:

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Expressive Efficiency and Inductive Bias of Convolutional Networks:

Machine learning quantum phases of matter

Sum-Product Networks: A New Deep Architecture

How to do backpropagation in a brain

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Quantum Hamiltonian Complexity. Itai Arad

Holographic Geometries from Tensor Network States

Artificial Neural Networks. MGS Lecture 2

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Tensor network simulations of strongly correlated quantum systems

Positive Tensor Network approach for simulating open quantum many-body systems

Guiding Monte Carlo Simulations with Machine Learning

Supervised Learning. George Konidaris

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Introduction to Machine Learning

Simulation of Quantum Many-Body Systems

Machine Learning. Neural Networks

The Perceptron. Volker Tresp Summer 2014

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Topic 3: Neural Networks

Markov Chains for Tensor Network States

Lecture 16 Deep Neural Generative Models

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Renormalization of Tensor- Network States Tao Xiang

The Origin of Deep Learning. Lili Mou Jan, 2015

Matrix Product Operators: Algebras and Applications

Learning to Disentangle Factors of Variation with Manifold Learning

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

Advanced computational methods X Selected Topics: SGD

Statistical Machine Learning

Entanglement spectrum and Matrix Product States

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Math 671: Tensor Train decomposition methods II

Introduction to Machine Learning

STA 4273H: Statistical Machine Learning

arxiv: v1 [cond-mat.str-el] 19 Jan 2012

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Simulating Quantum Systems through Matrix Product States. Laura Foini SISSA Journal Club

arxiv: v4 [quant-ph] 13 Mar 2013

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine learning

Machine Learning Basics

Introduction to Convolutional Neural Networks (CNNs)

TUTORIAL PART 1 Unsupervised Learning

Numerical tensor methods and their applications

Holographic Branching and Entanglement Renormalization

Deep Learning Autoencoder Models

Transcription:

Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017

Collaboration with David J. Schwab, Northwestern and CUNY Graduate Center Quantum Machine Learning, Perimeter Institute, Aug 2016

Exciting time for machine learning Language Processing Self-driving cars Medicine Materials Science / Chemistry

Progress in neural networks and deep learning neural network diagram

Convolutional neural network "MERA" tensor network

Are tensor networks useful for machine learning? This Talk Tensor networks fit naturally into kernel learning (Also very strong connections to graphical models) Many benefits for learning Linear scaling Adaptive Feature sharing

Machine Learning Physics Topological Phases Phase Transitions Neural Nets Boltzmann Machines Quantum Monte Carlo Sign Problem Kernel Learning Materials Science & Chemistry Supervised Learning Unsupervised Learning Tensor Networks

Machine Learning Physics Topological Phases Phase Transitions Neural Nets Boltzmann Machines Quantum Monte Carlo Sign Problem Kernel Learning Materials Science & Chemistry Supervised Learning Unsupervised Learning (this talk) Tensor Networks

What are Tensor Networks?

How do tensor networks arise in physics? Quantum systems governed by Schrödinger equation: Ĥ ~ = E ~ It is just an eigenvalue problem.

The problem is that Ĥ is a 2 N x 2 N matrix ~ =) wavefunction has 2 N components = E Ĥ ~ ~

Natural to view wavefunction as order-n tensor i = X {s} s 1 s 2 s 3 s N s 1 s 2 s 3 s N i

Natural to view wavefunction as order-n tensor s 1 s 2 s 3 s 4 s N s 1 s 2 s 3 s N =

Tensor components related to probabilities of e.g. Ising model spin configurations ""#""#" = # # " " " " "

Tensor components related to probabilities of e.g. Ising model spin configurations # # # # # " " #""##"# =

Must find an approximation to this exponential problem s 1 s 2 s 3 s 4 s N s 1 s 2 s 3 s N =

Simplest approximation (mean field / rank-1) Let spins "do their own thing" s 1 s 2 s 3 s 4 s 5 s 6 ' s 1 s 2 s 3 s 4 s 5 s 6 s 1 s 2 s 3 s 4 s 5 s 6 Expected values of individual spins ok No correlations

Restore correlations locally s 1 s 2 s 3 s 4 s 5 s 6 ' s 1 s 2 s 3 s 4 s 5 s 6 s 1 s 2 s 3 s 4 s 5 s 6

Restore correlations locally s 1 s 2 s 3 s 4 s 5 s 6 ' s 1 s 2 s 3 s 4 s 5 s 6 i 1 i 1 s 1 s 2 s 3 s 4 s 5 s 6

Restore correlations locally s 1 s 2 s 3 s 4 s 5 s 6 ' s 1 s 2 s 3 s 4 s 5 s 6 i 1 i 1 i 2 i 2 i 3 i 3 i 4 i 4 i 5 i 5 s 1 s 2 s 3 s 4 s 5 s 6 matrix product state (MPS) Local expected values accurate Correlations decay with spatial distance

" # # " " # "Matrix product state" because retrieving an element = product of matrices

= "##""# "Matrix product state" because retrieving an element = product of matrices

Tensor diagrams have rigorous meaning j v j i j M ij i j k T ijk

Joining lines implies contraction, can omit names X i j j M ij v j A ij B jk = AB A ij B ji =Tr[AB]

MPS = matrix product state MPS approximation controlled by bond dimension "m" (like SVD rank) Compress 2 N N 2 m 2 parameters parameters into m 2 N 2 can represent any tensor

Friendly neighborhood of "quantum state space" m=8 m=4 m=2 m=1

MPS = matrix product state MPS lead to powerful optimization techniques (DMRG algorithm) White, PRL 69, 2863 (1992) Stoudenmire, White, PRB 87, 155137 (2013)

Besides MPS, other successful tensor are PEPS and MERA PEPS (2D systems) MERA (critical systems) Evenbly, Vidal, PRB 79, 144108 (2009) Verstraete, Cirac, cond-mat/0407066 (2004) Orus, Ann. Phys. 349, 117 (2014)

Supervised Kernel Learning

Supervised Learning Very common task: Labeled training data (= supervised) Find decision function f(x) f(x) > 0 x 2 A f(x) < 0 x 2 B Input vector x e.g. image pixels

ML Overview Use training data to build model x 16 x 13 x 1 x 3 x 4 x 7 x 8 x 9 x 10 x 11 x 12 x 14 x 15 x 6 x 2 x 5

ML Overview Use training data to build model x 16 x 13 x 1 x 3 x 4 x 7 x 8 x 9 x 10 x 11 x 12 x 14 x 15 x 6 x 2 x 5

ML Overview Use training data to build model Generalize to unseen test data

ML Overview Popular approaches Neural Networks f(x) = 2 M 2 1 M 1 x Non-Linear Kernel Learning f(x) =W (x)

Non-linear kernel learning Want f(x) to separate classes Linear classifier often insufficient f(x) =W x??

Non-linear kernel learning Apply non-linear "feature map" x! (x)

Non-linear kernel learning Apply non-linear "feature map" x! (x) Decision function f(x) =W (x)

Non-linear kernel learning Decision function f(x) =W (x) Linear classifier in feature space

Non-linear kernel learning Example of feature map x =(x 1,x 2,x 3 ) (x) =(1,x 1,x 2,x 3,x 1 x 2,x 1 x 3,x 2 x 3 ) x is "lifted" to feature space

Proposal for Learning

Grayscale image data

Map pixels to "spins"

Map pixels to "spins"

Map pixels to "spins"

x = input Local feature map, dimension d=2 (x j )= h cos 2 x j i, sin 2 x j x j 2 [0, 1] Crucially, grayscale values not orthogonal

x = input = local feature map Total feature map (x) s 1 s 2 s N (x) = s 1 (x 1 ) s 2 (x 2 ) s N (x N ) Tensor product of local feature maps / vectors Just like product state wavefunction of spins Vector in dimensional space 2 N

x = input = local feature map Total feature map (x) More detailed notation x =[x 1, x 2, x 3,..., x N ] raw inputs (x) = [ 1( x 1 ) 2( x 1 ) [ [ 1( x 2 ) 2( x 2 ) [ [ 1( x 3 ) 2( x 3 ) [ [ 1( x N )... feature 2( x N ) [ vector

x = input = local feature map Total feature map (x) Tensor diagram notation x =[x 1, x 2, x 3,..., x N ] raw inputs (x) = s 1 s 2 s 3 s 4 s 5 s 6 s 1 s 2 s 3 s 4 s 5 s 6 s N s N feature vector

Construct decision function f(x) =W (x) (x)

Construct decision function f(x) =W (x) W (x)

Construct decision function f(x) =W (x) f(x) = W (x)

Construct decision function f(x) =W (x) f(x) = W (x) W =

Main approximation W = order-n tensor matrix product state (MPS)

MPS form of decision function f(x) = W (x)

Linear scaling Can use algorithm similar to DMRG to optimize Scaling is N N T m 3 N = size of input N T = size of training set m = MPS bond dimension f(x) = W (x)

Linear scaling Can use algorithm similar to DMRG to optimize Scaling is N N T m 3 N = size of input N T = size of training set m = MPS bond dimension f(x) = W (x)

Linear scaling Can use algorithm similar to DMRG to optimize Scaling is N N T m 3 N = size of input N T = size of training set m = MPS bond dimension f(x) = W (x)

Linear scaling Can use algorithm similar to DMRG to optimize Scaling is N N T m 3 N = size of input N T = size of training set m = MPS bond dimension f(x) = W (x)

Linear scaling Can use algorithm similar to DMRG to optimize Scaling is N N T m 3 N = size of input N T = size of training set m = MPS bond dimension f(x) = W (x) Could improve with stochastic gradient

Multi-class extension of model Decision function f `(x) =W ` (x) Index ` runs over possible labels ` f `(x) = W ` (x) ` = W ` (x) Predicted label is argmax` f `(x)

MNIST Experiment MNIST is a benchmark data set of grayscale handwritten digits (labels ` = 0,1,2,...,9) 60,000 labeled training images 10,000 labeled test images

MNIST Experiment One-dimensional mapping

MNIST Experiment Results Bond dimension m = 10 m = 20 m = 120 Test Set Error ~5% (500/10,000 incorrect) ~2% (200/10,000 incorrect) 0.97% (97/10,000 incorrect) State of the art is < 1% test set error

MNIST Experiment Demo Link: http://itensor.org/miles/digit/index.html

Understanding Tensor Network Models f(x) = W (x)

Again assume W is an MPS f(x) = W (x) Many interesting benefits Two are: 1. Adaptive 2. Feature sharing

{ 1. Tensor networks are adaptive boundary pixels not useful for learning grayscale training data

2. Feature sharing ` f `(x) = W ` (x) ` = Different central tensors "Wings" shared between models Regularizes models

2. Feature sharing ` f `(x) = Progressively learn shared features

2. Feature sharing ` f `(x) = Progressively learn shared features

2. Feature sharing ` f `(x) = Progressively learn shared features

2. Feature sharing ` f `(x) = Progressively learn shared features Deliver to central tensor `

Nature of Weight Tensor Representer theorem says exact W = X j j (x j ) Density plots of trained W ` for each label ` =0, 1,...,9

Nature of Weight Tensor Representer theorem says exact W = X j j (x j ) Tensor network approx. can violate this condition W MPS 6= X j j (x j ) for any { j } Tensor network learning not interpolation Interesting consequences for generalization?

Some Future Directions Apply to 1D data sets (audio, time series) Other tensor networks: TTN, PEPS, MERA Useful to interpret W (x) 2 as probability? Could import even more physics insights. Features extracted by elements of tensor network?

W What functions realized for arbitrary? Instead of "spin" local feature map, use* (x) =(1,x) Recall total feature map is [ (x) = 1 ( x 1 ) 2( x 1 ) [ [ 1( x 2 ) 2( x 2 ) [ [ 1( x 3 ) 2( x 3 ) [... [ 1( x N ) 2( x N ) [ *Novikov, et al., arxiv:1605.03795

[ N=2 case (x) =(1,x) (x) =[ 1 [ x 1 [ 1 x 2 =(1,x 1,x 2,x 1 x 2 ) (W 11,W 21,W 12,W 22 ) f(x) =W (x) = (1, x 1, x 2, x 1 x 2 ) = W 11 + W 21 x 1 + W 12 x 2 + W 22 x 1 x 2

[ [ N=3 case (x) =(1,x) (x) =[ 1 [ x 1 [ 1 x 2 [ 1 x 3 =(1,x 1,x 2,x 3,x 1 x 2,x 1 x 3,x 2 x 3,x 1 x 2 x 3 ) f(x) =W (x) = W 111 + W 211 x 1 + W 121 x 2 + W 112 x 3 + W 221 x 1 x 2 + W 212 x 1 x 3 + W 122 x 1 x 3 + W 222 x 1 x 2 x 3

General N case x 2 R N f(x) =W (x) = W 111 1 + W 211 1 x 1 + W 121 1 x 2 + W 112 1 x 3 +... + W 221 1 x 1 x 2 + W 212 1 x 1 x 3 +... + W 222 1 x 1 x 2 x 3 +... constant singles doubles triples +... + W 222 2 x 1 x 2 x 3 x N N-tuple Model has exponentially many formal parameters Novikov, Trofimov, Oseledets, arxiv:1605.03795 (2016)

Related Work Novikov, Trofimov, Oseledets (1605.03795) matrix product states + kernel learning stochastic gradient descent Cohen, Sharir, Shashua (1410.0781, 1506.03059, 1603.00162, 1610.04167) tree tensor networks expressivity of tensor network models correlations of data (analogue of entanglement entropy) generative proposal

Other MPS related work ( = "tensor trains") Markov random field models Novikov et al., Proceedings of 31st ICML (2014) Large scale PCA Lee, Cichocki, arxiv: 1410.6895 (2014) Feature extraction of tensor data Bengua et al., IEEE Congress on Big Data (2015) Compressing weights of neural nets Novikov et al., Advances in Neural Information Processing (2015)