Three right directions and three wrong directions for tensor research

Similar documents
Lecture: Local Spectral Methods (1 of 4)

Linear Algebra and Robot Modeling

Locally-biased analytics

Using Local Spectral Methods in Theory and in Practice

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Google Page Rank Project Linear Algebra Summer 2012

Notes on Latent Semantic Analysis

1 Searching the World Wide Web

Quick Introduction to Nonnegative Matrix Factorization

Link Analysis Ranking

Lecture: Face Recognition and Feature Reduction

Discriminative Direction for Kernel Classifiers

Data Mining and Matrices

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

a Short Introduction

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Lecture: Face Recognition and Feature Reduction

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

STA141C: Big Data & High Performance Statistical Computing

CS 188: Artificial Intelligence Fall Recap: Inference Example

Conditional probabilities and graphical models

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

Scalable Tensor Factorizations with Incomplete Data

Data Mining Recitation Notes Week 3

Convex sets, conic matrix factorizations and conic rank lower bounds

Approximate Principal Components Analysis of Large Data Sets

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

Introduction to Machine Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Online Social Networks and Media. Link Analysis and Web Search

Structured matrix factorizations. Example: Eigenfaces

A New Space for Comparing Graphs

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

Practical Linear Algebra: A Geometry Toolbox

Singular Value Decompsition

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Faloutsos, Tong ICDE, 2009

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem

Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

Collaborative topic models: motivations cont

STA 4273H: Statistical Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Lecture 21: Spectral Learning for Graphical Models

Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Computational Linear Algebra

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Introduction to Information Retrieval

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Reproducing Kernel Hilbert Spaces

PV211: Introduction to Information Retrieval

Stat 315c: Introduction

Introduction to Data Mining

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra (Review) Volker Tresp 2017

Approximate Inference

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

A primer on matrices

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Towards a Regression using Tensors

Lecture 12: Link Analysis for Web Retrieval

CS 188: Artificial Intelligence Spring 2009

RandNLA: Randomized Numerical Linear Algebra

Machine Learning CPSC 340. Tutorial 12

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

CS246 Final Exam, Winter 2011

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Linear Algebra (Review) Volker Tresp 2018

MATH36061 Convex Optimization

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Online Social Networks and Media. Link Analysis and Web Search

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Advanced Introduction to Machine Learning CMU-10715

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

STA 414/2104: Machine Learning

Analysis and synthesis: a complexity perspective

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Linear dimensionality reduction for data analysis

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

Lecture 7 Mathematics behind Internet Search

Graph Partitioning Using Random Walks

Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina

The core of solving constraint problems using Constraint Programming (CP), with emphasis on:

Latent Semantic Analysis. Hongning Wang

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS. Pauli Miettinen TML September 2013

Nonlinear Optimization Methods for Machine Learning

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

9.2 Support Vector Machines 159

Lecture 5 : Projections

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Latent Semantic Analysis. Hongning Wang

The Learning Problem and Regularization

Transcription:

Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney )

Lots and lots of large data! High energy physics experimental data Hyperspectral medical and astronomical image data DNA microarray data and DNA SNP data Medical literature analysis data Collaboration and citation networks Internet networks and web graph data Advertiser-bidded phrase data Static and dynamic social network data

Scientific and Internet data individuals SNPs AG AG AG AG AA CC GG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT AA AG AG AG AA CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG CT AA GG GG GG AA CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG CT AG AG AG AG AA CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG CT AA AG AG AG AA CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT

Algorithmic vs. Statistical Perspectives Lambert (2000) Computer Scientists Data: are a record of everything that happened. Goal: process the data to find interesting patterns and associations. Methodology: Develop approximation algorithms under different models of data access since the goal is typically computationally hard. Statisticians Data: are a particular random instantiation of an underlying process describing unobserved patterns in the world. Goal: is to extract information about the world from noisy data. Methodology: Make inferences (perhaps about unseen events) by positing a model that describes the random variability of the data around the deterministic model.

Matrices and Data individuals SNPs AG AG AG AG AA CC GG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT AA AG AG AG AA CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG CT AA GG GG GG AA CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG CT AG AG AG AG AA CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG CT AA AG AG AG AA CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT Matrices provide simple representations of data: A ij = 0 or 1 (perhaps then weighted), depending on whether word i appears in document j A ij = -1,0,+1, if homozygous for the major allele, heterozygous, or homozygous for the minor allele Can take advantage of nice properties of vector spaces: structural properties: SVD, Euclidean geometry algorithmic properties: everything is O(n 3 ) statistical properties: PCA, regularization, etc.

Graphs and Data Interaction graph model of networks: Nodes represent entities Edges represent interaction between pairs of entities Common variations include: Directed graphs Weighted graphs Bipartite graphs

Why model data as graphs and matrices? Graphs and matrices - provide natural mathematical structures that provide algorithmic, statistical, and geometric benefits provide nice tradeoff between rich descriptive framework and sufficient algorithmic structure provide regularization due to geometry, either explicitly due to R n or implicitly due to approximation algorithms

What if graphs/matrices don t work? Employ more general mathematical structures: Hypergraphs Attributes associated with nodes Kernelize the data using, e.g., a similarity notion Generalized linear or hierarchical models Tensors!! These structures provide greater descriptive flexibility, that typically comes at a (moderate or severe) computational cost.

What is a tensor? (1 of 3) See L.H.Lim s tutorial on tensors at MMDS 2006.

What is a tensor? (2 of 3)

What is a tensor? (3 of 3) IMPORTANT: This is similar to NLA --- but, there is no reason to expect the subscript manipulation methods, so useful in NLA, to yield anything meaningful for more general algebraic structures.

Tensor ranks and data analysis (1 of 3)

Tensor ranks and data analysis (2 of 3) IMPORTANT: These ill-posedness results are NOT pathological--- they are ubiquitous and essential properties of tensors.

Tensor ranks and data analysis (3 of 3) THAT IS: To get a simple or low-rank tensor approximation, we focus on exceptions to fundamental ill-posedness properties of tensors (i.e., rank-1 tensors and 2-mode tensors).

Historical Perspective on NLA NLA grew out of statistics (among other areas) (40s and 50s) NLA focuses on numerical issues (60s, 70s, and 80s) Large-scale data generation increasingly common (90s and 00s) NLA has suffered due to the success of PageRank and HITS. Large-scale scientific and Internet data problems invite us to take a broader perspective on traditional NLA: revisit algorithmic basis of common NLA matrix algorithms revisit statistical underpinnings of NLA expand traditional NLA view of tensors

The gap between NLA and TCS Matrix factorizations: in NLA and scientific computing - used to express a problem s.t. it can be solved more easily. in TCS and statistical data analysis - used to represent structure that may be present in a matrix obtained from object-feature observations. NLA: emphasis on optimal conditioning, backward error analysis issues, is running time a large or small constant multiplied by n 2 or n 3. TCS: motivated by large data applications space-constrained or pass-efficient models over-sampling and randomness as computational resources. MMDS06, MMDS08, were designed to bridge the gap between NLA, TCS, and data applications.

How to bridge the gap (Lessons from MMDS) In a vector space, everything is easy, multi-linear captures the inherent intractability of NP-hardness. Convexity is an appropriate generalization of linear nice algorithmic framework, as with kernels in Machine Learning Randomness, over-sampling, approximation... are powerful algorithmic resources but you need to have a clear objective you are solving Geometry of combinatorial objects (e.g., graphs, tensors, etc.) has positive algorithmic, statistical, and conceptual benefits Approximate computation induces implicit statistical regularization

Examples of tensor data (Acar and Yener 2008) Chemistry: model fluorescence excitation-emission data in food science: A ijk is samples x emission x excitation. Neuroscience: EEG data as patients, doses, conditions, etc. varied: A ijk is time samples x frequency x electrodes. Social network and Web analysis: to discover hidden structures: A ijk is webpages x webpages x anchor text. A ijk is users x queries x webpages. A ijk is advertisers x bidded-phrases x time. Computer Vision: image compression and face recognition: A ijk is pixel x illumination x expression x viewpoint x person. Quantum mechanics, large-scale computation, hyperspectral data, climate data, ICA, nonnegative data, blind source separation, NP-hard problems, Tensor-based data are particularly challenging due to their size and since many data analysis tools based on graph theory and linear algebra do not easily generalize. -- MMD06

Three Right Directions

Three Right Directions 1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.)

Three Right Directions 1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.) 2. Understand the geometry of tensors. (NOT of vector spaces you unfold to.)

Three Right Directions 1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.) 2. Understand the geometry of tensors. (NOT of vector spaces you unfold to.) 3. Understand WHY tensors work in physical applications and what this says about less structured data applications (and vice-versa, which has been very fruitful for matrices*.) *(E.g., low-rank off-diagonal blocks are common in matrices -- since the world is 3D, which is not true in less structured applications -- this has significant algorithmic implications.)

Four! Right Directions 1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.) 2. Understand the geometry of tensors. (NOT of vector spaces you unfold to.) 3. Understand WHY tensors work in physical applications and what this says about less structured data applications (and vice-versa, which has been very fruitful for matrices*.) 4. Understand unfolding as a process of defining features. (Since this puts you in a nice algorithmic place.) *(E.g., low-rank off-diagonal blocks are common in matrices -- since the world is 3D, which is not true in less structured applications -- this has significant algorithmic implications.)

Three Wrong directions

Three Wrong directions 1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since R n is so nice.)

Three Wrong directions 1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since R n is so nice.) 2. Using methods that damage geometry and enhance sparsity. (BTW, you will do this if you don t understand the underlying geometric and sparsity structure.)

Three Wrong directions 1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since R n is so nice.) 2. Using methods that damage geometry and enhance sparsity. (BTW, you will do this if you don t understand the underlying geometric and sparsity structure.) 3. Doing Applied Ramsey Theory : Theorem: Given a large enough universe of data, then for any algorithm there exists a data set s.t. it performs well. (Show me where your method fails AND where it succeeds! Otherwise, is your result about your data or your method? Of course, this applies more generally in data analysis.)

Conclusions Large-scale data applications have been the main driver for a lot of the interest in tensors. Tensors are tricky to deal with, both algorithmically and statistically. Let s use this meeting to refine my directions in light of motivating data applications.