Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Size: px
Start display at page:

Download "Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016"

Transcription

1 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19,

2 Tensors Matrix Tensor: higher-order matrix three-way tensor: p-way tensor: 2

3 Tensor == Vector? Fibers? Tensors can be viewed as a collection of fibers (Kolda & Bader, 2009) 3

4 Tensor == Matrix? Slices? Tensors can be viewed as a collection of slices (Kolda & Bader, 2009) 4

5 Tensors Why tensors? tensors capture multilinear structure more flexible and powerful models e.g. parameter estimation in latent variable modelling (to be discussed shortly) Tensor: object in its own right its own geometrical, statistical and computational issues much harder to work with than a matrix 5

6 Example: Single Topic Model Model: prob. k topics (dists. over d words) sample topic h: prob h = i = w i (i [k]) each document has m words x 1, x 2,, x m sampled i.i.d. from μ i o voc. Dataset: m-word documents Goal: learn parameters 6

7 Example: Single Topic Model Method of moments: Key idea: find parameters (approx.) consistent with observed moments (Pearson, 1894) Karl Pearson (1857~1936) Procedure: setup equations Moment population θ solve approximately = Moment sample 7

8 Example: Single Topic Model Goal: model the topics of the documents in a given corpus θ Samples of documents generated based on single topic model with params θ = ( μ i, w i ) Unsupervised learning alg. Method of moments setup equations solve them approx. Model parameters Which moments to use? 1 st -order? 2 nd -order? 3 rd -order? p th -order? 8

9 Example: Single Topic Model Binary encoding: x t = e i the t-th word in the document is i-th word in the vocabulary t, i m [d] E.g. topic: animal 3-word document 7-word vocabulary vocab. x 1 x 2 x 3 cats dogs fear I like raise want

10 Example: Single Topic Model Binary encoding: x t = e i First moment the t-th word in the document is i-th word in the vocabulary t, i m [d] Not identifiable: only d numbers for d + 1 k parameters. 10

11 Example: Single Topic Model Second moment Matrix-mode: still not identifiable even though d d+1 2 > d + 1 k. (Why?) 11

12 Example: Single Topic Model Identifiable? NO! U R n k is a solution M = UU T R n n M = UQ(UQ) T, for any Q O k (QQ T = Q T Q = I k ) U R n k is a solution for any Q O k AMBUIGUITY! Matrix mode: INSUFFICIENT to identify parameters.! 12

13 Example: Single Topic Model Third-order moment Tensor mode 13

14 Example: Single Topic Model Claim: uniquely determine the parameters θ 14

15 Example: Single Topic Model Reduction to orthogonal case via whitening Whiten: project to k dimensions transform to orthogonality M = UDU T W = UD 1/2 Reduced Eigen- Decomp. apply W to M and T v i i [k] forms an orthonormal basis for R k 15

16 Example: Single Topic Model Spectral theorem and eigen-decompositions v i i [k] forms an orthnormal basis for R k symmetric matrix symmetric tensor eigen-decomp. is unique iff λ i λ j if such decomp. exists, then it is always unique (even if λ i s all same) Uniqueness of orthogonal decomp. uniquely determine the parameters θ Identifiability issue is resolved via tensor mode! 16

17 2 nd Example: Mixture of Spherical Gaussians Model: k means Sample cluster h = i with probability w i (i [k]) Observe x, with i.i.d. homogeneous spherical noise Dataset: multiple points Goal: learn parameters 17

18 2 nd Example: Mixture of Spherical Gaussians Identifiable using 1 st, 2 nd and 3 rd order moments together [Hsu & Kakade, 13] Same approach as simple topic model! 18

19 General Principle Similar structures prevail in latent variable models Latent Dirichlet Allocatioin (LDA) Mixed Multinomial Logit Model Mixture of Gaussians (MoG) Hidden Markov Models (HMMs) 19

20 Orthogonal Decomposition How to find (λ i, v i ) i [k]? ( < v i, v j > = δ ij ) symmetric matrix symmetric tensor successive rank-one approximation (SROA) generalized SROA? 20

21 Successive Rank-One Approximation Symmetric matrix SROA: rank-one approximation + deflation Repeat k times (rank-one approx.) (deflation) Recover exactly ( ) 21

22 Successive Rank-One Approximation Symmetric orthogonal decomposable (SOD) tensor SROA: rank-one approximation + deflation Repeat k times (rank-one approx.) (deflation) Recover exactly up to sign flips [Zhang & Golub, 01] N.B.: no requirement on λ s to be distinct 22

23 Successive Rank-One Approximation Sampling error # samples = # samples < Other potential sources of perturbation Model misspecification Numerical error Is SROA robust to the perturbation? Recall matrix perturbation theory (e.g. Davis-Kahan), which requires perturbation matrix < min i j λ i λ j. 23

24 Successive Rank-One Approximation Perturbed SOD tensor SROA: rank-one approximation + deflation Repeat k times (rank-one approx.) (deflation) 24

25 Successive Rank-One Approximation Input: the perturbed SOD tensor Repeat k times Output: (rank-one approx.) (deflation) Theorem (MHG, 15) Output perm. on s.t. N.B. generalizes matrix perturbation analysis NO spectral gap quantity involved 25

26 Best Rank-One Tensor Approximation eliminate, then solve & 26

27 SDP Relaxation [Jiang, Ma & Zhang, 14] (*) reshape to induced symmetry & convex relaxation (*) The p = 3 tensor problem can be transformed into a p = 4 tensor problem [JMZ, 14]. 27

28 SDP Relaxation Linear constraints E.g. v = v 1, v 2 T X = vec(v v) vec v v T = = 4 v 1 v 3 1 v 2 3 v 2 v 1 v v 2 v 3 1 v 2 v v 2 v v 2 3 v 1 v 2 3 v 2 v 1 v v 2 v v 2 3 v 1 v 2 v v 2 3 v 1 v 2 3 v 1 v 2 4 v 2 v 1 2 v 1 v 2 v 2 v 1 v 2 2 v 1 2 v 1 v 2 v 2 v 1 v 2 2 spherical constr. 1 = v 2 2 = v v 2 2 = v v = trace(x) hyper- symmetry X = a d d b d b b e d b b e b e e c 28

29 SDP Relaxation N.B. SDP relaxation proposed by [Jiang, Ma & Zhang, 14] Square reshaping trick for low-rank tensor recovery [MHWG, 14] Equivalent SDP (of reduced size) proposed by [Nie & Wang, 14] using moment-based convex relaxation Empirically, observed almost always! i.e., SDP relaxation solves nonconvex problem! 29

30 Solving SDP Semidefinite programming (SDP) Linear constraints: 30

31 Deriving the dual problem Lagrangian function Dual problem 31

32 Augmented Lagrangian Method Augmented Lagrangian function Augmented Lagrangian method (ALM) k-th iteration: compute update minimizing jointly over is hard! 32

33 Alternating Direction Method of Multipliers (ADMM) Remedy: alternating direction minimize alternatively along y-direction and S-direction ADMM k-th iteration: y-update: S-update: X-update: Each step is (relatively) easy to compute! 33

34 Update y First-order optimality condition: assume is onto (surjective) 34

35 Update S with ( Eigen-Decomp.) 35

36 Update muliplier X 36

37 Convergence Semidefinite programming (SDP) Assumption: is onto s.t. and (Slater s condition) Convergence result Theorem (WGY, 10) From any starting point, a primal and dual solution 37

38 Another ADMM for Tensor Problem Tensor robust principal component anaysis (T-RPCA) X = L + S R n 1 n 2 n K Structural assumptions: L: low rank in each mode ( ) S: sparsely supported L (k) i.e. rank(l (k) ) is small for all k [K] i.e. cardinality S: S 0 is small Problem: Given X, recover L and S. 38

39 T-RPCA Convex surrogates Convex optimization rank(l (i) ) L (i) σ j (L (i) ) cardinality(s) S 1 S i1 i 2 i K min L,S s.t. λ i L i + S 1 L + S = X N.B. generalizes matrix robust PCA [CLMW, 11] theoretical guarantees [MHWG, 14] [HMGW, 15] 39

40 T-RPCA Variable Splitting i λ i L (i) L L 1 = L 2 = = L K i λ i L i, i 40

41 T-RPCA Reformulated problem min L i, S s.t. λ i L i, i + S 1 L i + S = X, i [K] Augmented Lagrangian function ADMM k-th iteration: L i -update: S-update: Λ i -update: 41

42 References [ZG] Zhang, T. and Golub G. H.. "Rank-one approximation to high order tensors." SIAM Journal on Matrix Analysis and Applications 23.2 (2001): [AGHKT] Anandkumar, A., et al. "Tensor decompositions for learning latent variable models." The Journal of Machine Learning Research 15.1 (2014): [HS] Hsu, D. and Sham M. K.. "Learning mixtures of spherical gaussians: moment methods and spectral decompositions." Proceedings of the 4th conference on Innovations in Theoretical Computer Science. ACM, [TB] Kolda, T. G. and Bader B. W. Bader.. "Tensor decompositions and applications." SIAM review 51.3 (2009): [JMZ] Jiang, B., Ma, S. and Zhang S.. "Tensor principal component analysis via convex optimization." Mathematical Programming150.2 (2015): [NW] Nie, J. and Wang, L.. "Semidefinite relaxations for best rank-1 tensor approximations." SIAM Journal on Matrix Analysis and Applications 35.3 (2014): [CLMW] Candès, E. J., et al. "Robust principal component analysis?."journal of the ACM (JACM) 58.3 (2011):

43 References [WGY] Wen, Z., Goldfarb, D., & Yin, W. (2010). Alternating direction augmented Lagrangian methods for semidefinite programming. Mathematical Programming Computation, 2(3-4), [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor recovery." Proceedings of the international conference on Machine learning. ACM, 2014 [GQ] Goldfarb, D., & Qin, Z.. "Robust low-rank tensor recovery: Models and algorithms." SIAM Journal on Matrix Analysis and Applications 35.1 (2014): [HMGW] Huang, B., et al. "Provable models for robust low-rank tensor completion." Pacific Journal of Optimization 11 (2015): [MHG] Mu, C., Hsu, D., & Goldfarb, D.. "Successive Rank-One Approximations for Nearly Orthogonally Decomposable Symmetric Tensors." SIAM Journal on Matrix Analysis and Applications 36.4 (2015):

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Efficient Spectral Methods for Learning Mixture Models!

Efficient Spectral Methods for Learning Mixture Models! Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

An ADMM Algorithm for Clustering Partially Observed Networks

An ADMM Algorithm for Clustering Partially Observed Networks An ADMM Algorithm for Clustering Partially Observed Networks Necdet Serhat Aybat Industrial Engineering Penn State University 2015 SIAM International Conference on Data Mining Vancouver, Canada Problem

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition. Independent component

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

ROBUST LOW-RANK TENSOR MODELLING USING TUCKER AND CP DECOMPOSITION

ROBUST LOW-RANK TENSOR MODELLING USING TUCKER AND CP DECOMPOSITION ROBUST LOW-RANK TENSOR MODELLING USING TUCKER AND CP DECOMPOSITION Niannan Xue, George Papamakarios, Mehdi Bahri, Yannis Panagakis, Stefanos Zafeiriou Imperial College London, UK University of Edinburgh,

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

On the local stability of semidefinite relaxations

On the local stability of semidefinite relaxations On the local stability of semidefinite relaxations Diego Cifuentes Department of Mathematics Massachusetts Institute of Technology Joint work with Sameer Agarwal (Google), Pablo Parrilo (MIT), Rekha Thomas

More information

Approximate Low-Rank Tensor Learning

Approximate Low-Rank Tensor Learning Approximate Low-Rank Tensor Learning Yaoliang Yu Dept of Machine Learning Carnegie Mellon University yaoliang@cs.cmu.edu Hao Cheng Dept of Electric Engineering University of Washington kelvinwsch@gmail.com

More information

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in

More information

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Qian Zhao 1,2 Deyu Meng 1,2 Xu Kong 3 Qi Xie 1,2 Wenfei Cao 1,2 Yao Wang 1,2 Zongben Xu 1,2 1 School of Mathematics and Statistics,

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Guaranteed Learning of Latent Variable Models through Tensor Methods

Guaranteed Learning of Latent Variable Models through Tensor Methods Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of Maryland furongh@cs.umd.edu ACM SIGMETRICS Tutorial 2018 1/75 Tutorial Topic Learning algorithms for latent

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Linear dimensionality reduction for data analysis

Linear dimensionality reduction for data analysis Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

Stochastic dynamical modeling:

Stochastic dynamical modeling: Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

arxiv: v4 [cs.lg] 26 Apr 2018

arxiv: v4 [cs.lg] 26 Apr 2018 : Online Spectral Learning for Single Topic Models Tong Yu 1, Branislav Kveton 2, Zheng Wen 2, Hung Bui 3, and Ole J. Mengshoel 1 arxiv:1709.07172v4 [cs.lg] 26 Apr 2018 1 Carnegie Mellon University 2 Adobe

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Optimal Linear Estimation under Unknown Nonlinear Transform

Optimal Linear Estimation under Unknown Nonlinear Transform Optimal Linear Estimation under Unknown Nonlinear Transform Xinyang Yi The University of Texas at Austin yixy@utexas.edu Constantine Caramanis The University of Texas at Austin constantine@utexas.edu Zhaoran

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Basis Learning as an Algorithmic Primitive

Basis Learning as an Algorithmic Primitive JMLR: Workshop and Conference Proceedings vol 49:1 42, 2016 Basis Learning as an Algorithmic Primitive Mikhail Belkin Luis Rademacher James Voss The Ohio State University MBELKIN@CSE.OHIO-STATE.EDU LRADEMAC@CSE.OHIO-STATE.EDU

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

A projection algorithm for strictly monotone linear complementarity problems.

A projection algorithm for strictly monotone linear complementarity problems. A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu

More information

Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W.,

Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W., Overview 1. Brief tensor introduction 2. Stein s lemma 3. Score and score matching for fitting models 4. Bringing it all together for supervised deep learning Tensor intro 1 Tensors are multidimensional

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Matrix-Tensor and Deep Learning in High Dimensional Data Analysis

Matrix-Tensor and Deep Learning in High Dimensional Data Analysis Matrix-Tensor and Deep Learning in High Dimensional Data Analysis Tien D. Bui Department of Computer Science and Software Engineering Concordia University 14 th ICIAR Montréal July 5-7, 2017 Introduction

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization Agenda Applications of semidefinite programming 1 Control and system theory 2 Combinatorial and nonconvex optimization 3 Spectral estimation & super-resolution Control and system theory SDP in wide use

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Efficient algorithms for estimating multi-view mixture models

Efficient algorithms for estimating multi-view mixture models Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions

More information

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3 MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications

More information

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Animashree Anandkumar 1, Daniel Hsu 2, Adel Javanmard 3, and Sham M. Kakade 2 1 Department of EECS, University of California,

More information

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu

More information

Learning Structured Probability Matrices!

Learning Structured Probability Matrices! Learning Structured Probability Matrices Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint work with Sham Kakade, Weihao Kong and Greg Valiant. 1 2 Learning Data

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015 Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

A Quantum Interior Point Method for LPs and SDPs

A Quantum Interior Point Method for LPs and SDPs A Quantum Interior Point Method for LPs and SDPs Iordanis Kerenidis 1 Anupam Prakash 1 1 CNRS, IRIF, Université Paris Diderot, Paris, France. September 26, 2018 Semi Definite Programs A Semidefinite Program

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

SDP Relaxations for MAXCUT

SDP Relaxations for MAXCUT SDP Relaxations for MAXCUT from Random Hyperplanes to Sum-of-Squares Certificates CATS @ UMD March 3, 2017 Ahmed Abdelkader MAXCUT SDP SOS March 3, 2017 1 / 27 Overview 1 MAXCUT, Hardness and UGC 2 LP

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems L.C.Smith College of Engineering and Computer Science Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems Ao Ren Sijia Liu Ruizhe Cai Wujie Wen

More information

On the convergence of higher-order orthogonality iteration and its extension

On the convergence of higher-order orthogonality iteration and its extension On the convergence of higher-order orthogonality iteration and its extension Yangyang Xu IMA, University of Minnesota SIAM Conference LA15, Atlanta October 27, 2015 Best low-multilinear-rank approximation

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Sham M Kakade c 2019 University of Washington cse446-staff@cs.washington.edu 0 / 10 Announcements Please do Q1

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Tensor Factorization via Matrix Factorization

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy Liang Department of Computer Science Stanford University Stanford, CA 94305 Abstract Tensor factorization arises in many machine learning applications, such

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics

More information

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University

More information

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Spectral Learning of Hidden Markov Models with Group Persistence

Spectral Learning of Hidden Markov Models with Group Persistence 000 00 00 00 00 00 00 007 008 009 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 Abstract In this paper, we develop a general Method of Moments

More information