Boutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014

Size: px
Start display at page:

Download "Boutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014"

Transcription

1 A Online PowerPoint Principal Presentation Component Analysis Boutsidis, Garber, Karnin, Liberty PRESENTED BY Firstname Lastname August 5, 013 PRESENTED BY Zohar Karnin November 3, 014

2 Data Matrix Often, data is represented as a huge matrix Sometimes, we can t store the entire matrix Yahoo labs

3 Principal Component Analysis Often, we require a low rank approximation of matrix A Recommender systems, images, LSA, The approximation is used to save space and often, clean up noise A = Yahoo labs

4 Column by Column Stream Data arrives column by column column=item and we re seeing the items one at a time 4 Yahoo labs

5 The Formal Stream Setup Observe x 1 R d, output y 1 R k 5 Yahoo labs

6 The Formal Stream Setup Observe x 1 R d, output y 1 R k 6 Yahoo labs

7 The Formal Stream Setup Observe x 1 R d, output y 1 R k Observe x t R d, output y t R k 7 Yahoo labs

8 The Formal Stream Setup X Cost = s.t Min t kx t ytk = embedding from R k to R d ky i -y j k=k yi- yjk Y 8 Yahoo labs

9 The Cost Function Y Output Input X 9 Yahoo labs

10 The Cost Function Y Y X - Embedding of Y into the same space of X 10 Yahoo labs

11 The Cost Function Y Y X - = R=X- Y Error matrix 11 Yahoo labs

12 The Cost Function Y Y X - Frob Error = R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix 1 Yahoo labs

13 The Cost Function Y Y X - 13 Yahoo labs = Frob Error R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix Spectral Error = krk = max kvk=1 kv > X v > ( Y)k

14 Secondary Costs: Computational Resources Run time: #operations required per observed column Memory 14 Yahoo labs

15 Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different 15 Yahoo labs

16 Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] 16 Yahoo labs

17 Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] Reconstruction matrix (not an embedding) [CW 09] min t kx t ytk s.t is an arbitrary linear transformation from R k to R d 17 Yahoo labs

18 Results X = d n matrix whose columns are observed 18 Yahoo labs

19 Results X = d n matrix whose columns are observed k << d 19 Yahoo labs

20 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) 0 Yahoo labs

21 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F 1 Yahoo labs

22 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Yahoo labs

23 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Theorem : Given k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² 3 ) 3 Yahoo labs

24 The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F 4 Yahoo labs

25 The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 5 Yahoo labs

26 The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error Alternative cost: kx Yk Interpretation: bounds max unit vector v, kv > X v > Yk noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 6 Yahoo labs

27 Results Theorem 3 [under construction] : Given kxk, kx-x k k, k, ²: Operator Norm Error = OPT operator + ²kXk Target dimension = O(k/²) 7 Yahoo labs

28 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 8 Yahoo labs

29 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 9 Yahoo labs

30 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 30 Yahoo labs

31 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 31 Yahoo labs

32 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 3 Yahoo labs

33 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 33 Yahoo labs

34 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 34 Yahoo labs

35 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 35 Yahoo labs

36 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 36 Yahoo labs

37 Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U 37 Yahoo labs

38 Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F 38 Yahoo labs

39 Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F ) number of vectors added to U ` 39 Yahoo labs

40 Analysis: Cost Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Operator norm cost = krk = max{r 1,r } Cost = krk F = r 1 +r r r 1 40 Yahoo labs

41 Analysis: Cost r = Tolerable error radius = kxk F / `1/ Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Statements: krk r = kxk F / ` krk F loss from X k + loss from X-X k kxk F (k/`) 1/ + kx-x k k F 41 Yahoo labs

42 Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] 4 Yahoo labs

43 Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d 43 Yahoo labs

44 Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d Instead: Maintain Z: d ` matrix such that ZZ > ¼ RR > kzz > - RR > k< krk F /` [Lib 1] Update time, memory requirements = d` 44 Yahoo labs

45 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ 45 Yahoo labs

46 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ 46 Yahoo labs

47 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 47 Yahoo labs

48 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 48 Yahoo labs

49 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 49 Yahoo labs

50 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 50 Yahoo labs

51 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch 51 Yahoo labs

52 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch Idea : if direction u becomes weak (ku > X t k kx t k F / `1/ ) remove it Thm: works as before, target dimension = ` / ² 5 Yahoo labs

53 Conclusions and Open Questions We obtain error = OPT + ²kXk F with target dimension O(k/² 3 ). Can we reduce the dependence on ²? Improve to OPT(1+²)? Lower bound? (currently same for arbitrary reconstruction matrix) Obtain approximation of OPT + ²kX-X k k 53 Yahoo labs

54 Thank you! 54 Yahoo labs

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) David Glickenstein December 7, 2015 1 Inner product spaces In this chapter, we will only consider the elds R and C. De nition 1 Let V be a vector

More information

and u and v are orthogonal if and only if u v = 0. u v = x1x2 + y1y2 + z1z2. 1. In R 3 the dot product is defined by

and u and v are orthogonal if and only if u v = 0. u v = x1x2 + y1y2 + z1z2. 1. In R 3 the dot product is defined by Linear Algebra [] 4.2 The Dot Product and Projections. In R 3 the dot product is defined by u v = u v cos θ. 2. For u = (x, y, z) and v = (x2, y2, z2), we have u v = xx2 + yy2 + zz2. 3. cos θ = u v u v,

More information

LOWELL WEEKLY JOURNAL

LOWELL WEEKLY JOURNAL KY Y 872 K & q $ < 9 2 q 4 8 «7 K K K «> 2 26 8 5 4 4 7»» 2 & K q 4 [«5 «$6 q X «K «8K K88 K 7 ««$25 K Q ««q 8 K K Y & 7K /> Y 8«#»«Y 87 8 Y 4 KY «7««X & Y» K ) K K 5 KK K > K» Y Y 8 «KK > /» >» 8 K X

More information

4 Linear Algebra Review

4 Linear Algebra Review Linear Algebra Review For this topic we quickly review many key aspects of linear algebra that will be necessary for the remainder of the text 1 Vectors and Matrices For the context of data analysis, the

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

10 Distributed Matrix Sketching

10 Distributed Matrix Sketching 10 Distributed Matrix Sketching In order to define a distributed matrix sketching problem thoroughly, one has to specify the distributed model, data model and the partition model of data. The distributed

More information

Edo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles

Edo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles Edo Liberty Principal Scientist Amazon Web Services Streaming Quantiles Streaming Quantiles Manku, Rajagopalan, Lindsay. Random sampling techniques for space efficient online computation of order statistics

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

One-Way ANOVA Model. group 1 y 11, y 12 y 13 y 1n1. group 2 y 21, y 22 y 2n2. group g y g1, y g2, y gng. g is # of groups,

One-Way ANOVA Model. group 1 y 11, y 12 y 13 y 1n1. group 2 y 21, y 22 y 2n2. group g y g1, y g2, y gng. g is # of groups, One-Way ANOVA Model group 1 y 11, y 12 y 13 y 1n1 group 2 y 21, y 22 y 2n2 group g y g1, y g2, y gng g is # of groups, n i denotes # of obs in the i-th group, and the total sample size n = g i=1 n i. 1

More information

Simple and Deterministic Matrix Sketches

Simple and Deterministic Matrix Sketches Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41 Data Matrices Often

More information

ECON 4117/5111 Mathematical Economics

ECON 4117/5111 Mathematical Economics Test 1 September 23, 2016 1. Suppose that p and q are logical statements. The exclusive or, denoted by p Y q, is true when only one of p and q is true. (a) Construct the truth table of p Y q. (b) Prove

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Wavelet decomposition of data streams. by Dragana Veljkovic

Wavelet decomposition of data streams. by Dragana Veljkovic Wavelet decomposition of data streams by Dragana Veljkovic Motivation Continuous data streams arise naturally in: telecommunication and internet traffic retail and banking transactions web server log records

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Lowell Dam Gone Out. Streets Turned I n t o Rivers. No Cause For Alarm Now However As This Happened 35 Years A&o

Lowell Dam Gone Out. Streets Turned I n t o Rivers. No Cause For Alarm Now However As This Happened 35 Years A&o V ()\\ ))? K K Y 6 96 Y - Y Y V 5 Z ( z x z \ - \ - - z - q q x x - x 5 9 Q \ V - - Y x 59 7 x x - Y - x - - x z - z x - ( 7 x V 9 z q &? - 9 - V ( x - - - V- [ Z x z - -x > -) - - > X Z z ( V V V

More information

3.1. Derivations. Let A be a commutative k-algebra. Let M be a left A-module. A derivation of A in M is a linear map D : A M such that

3.1. Derivations. Let A be a commutative k-algebra. Let M be a left A-module. A derivation of A in M is a linear map D : A M such that ALGEBRAIC GROUPS 33 3. Lie algebras Now we introduce the Lie algebra of an algebraic group. First, we need to do some more algebraic geometry to understand the tangent space to an algebraic variety at

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

More information

Functional Analysis Exercise Class

Functional Analysis Exercise Class Functional Analysis Exercise Class Week 9 November 13 November Deadline to hand in the homeworks: your exercise class on week 16 November 20 November Exercises (1) Show that if T B(X, Y ) and S B(Y, Z)

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

6.4 BASIS AND DIMENSION (Review) DEF 1 Vectors v 1, v 2,, v k in a vector space V are said to form a basis for V if. (a) v 1,, v k span V and

6.4 BASIS AND DIMENSION (Review) DEF 1 Vectors v 1, v 2,, v k in a vector space V are said to form a basis for V if. (a) v 1,, v k span V and 6.4 BASIS AND DIMENSION (Review) DEF 1 Vectors v 1, v 2,, v k in a vector space V are said to form a basis for V if (a) v 1,, v k span V and (b) v 1,, v k are linearly independent. HMHsueh 1 Natural Basis

More information

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C. Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Summary In RLS, the Tikhonov minimization problem boils down to solving a linear system (and this

More information

Quantum Recommendation Systems

Quantum Recommendation Systems Quantum Recommendation Systems Iordanis Kerenidis 1 Anupam Prakash 2 1 CNRS, Université Paris Diderot, Paris, France, EU. 2 Nanyang Technological University, Singapore. April 4, 2017 The HHL algorithm

More information

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja Homework Haimanot Kassa, Jeremy Morris & Isaac Ben Jeppsen October 7, 004 Exercise 1 : We can say that kxk = kx y + yk And likewise So we get kxk kx yk + kyk kxk kyk kx yk kyk = ky x + xk kyk ky xk + kxk

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Linear vector spaces and subspaces.

Linear vector spaces and subspaces. Math 2051 W2008 Margo Kondratieva Week 1 Linear vector spaces and subspaces. Section 1.1 The notion of a linear vector space. For the purpose of these notes we regard (m 1)-matrices as m-dimensional vectors,

More information

Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning:

Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning: Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u 2 1 + u 2 2 = u 2. Geometric Meaning: u v = u v cos θ. u θ v 1 Reason: The opposite side is given by u v. u v 2 =

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

ESTIMATION OF ERROR. r = b Abx a quantity called the residual for bx. Then

ESTIMATION OF ERROR. r = b Abx a quantity called the residual for bx. Then ESTIMATION OF ERROR Let bx denote an approximate solution for Ax = b; perhaps bx is obtained by Gaussian elimination. Let x denote the exact solution. Then introduce r = b Abx a quantity called the residual

More information

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Metric Spaces. DEF. If (X; d) is a metric space and E is a nonempty subset, then (E; d) is also a metric space, called a subspace of X:

Metric Spaces. DEF. If (X; d) is a metric space and E is a nonempty subset, then (E; d) is also a metric space, called a subspace of X: Metric Spaces DEF. A metric space X or (X; d) is a nonempty set X together with a function d : X X! [0; 1) such that for all x; y; and z in X : 1. d (x; y) 0 with equality i x = y 2. d (x; y) = d (y; x)

More information

Least squares and Eigenvalues

Least squares and Eigenvalues Lab 1 Least squares and Eigenvalues Lab Objective: Use least squares to fit curves to data and use QR decomposition to find eigenvalues. Least Squares A linear system Ax = b is overdetermined if it has

More information

ABSTRACT. Topics on LASSO and Approximate Message Passing. Ali Mousavi

ABSTRACT. Topics on LASSO and Approximate Message Passing. Ali Mousavi ABSTRACT Topics on LASSO and Approximate Message Passing by Ali Mousavi This thesis studies the performance of the LASSO (also known as basis pursuit denoising) for recovering sparse signals from undersampled,

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a

More information

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT MARCH 29, 26 LECTURE 2 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT (Davidson (2), Chapter 4; Phillips Lectures on Unit Roots, Cointegration and Nonstationarity; White (999), Chapter 7) Unit root processes

More information

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3. School of Computer Science 10-701 Introduction to Machine Learning Linear Regression Readings: Bishop, 3.1 Murphy, 7 Matt Gormley Lecture 4 September 19, 2016 1 Homework 1: due 9/26/16 Project Proposal:

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Processing Big Data Matrix Sketching

Processing Big Data Matrix Sketching Processing Big Data Matrix Sketching Dimensionality reduction Linear Principal Component Analysis: SVD-based Compressed sensing Matrix sketching Non-linear Kernel PCA Isometric mapping Matrix sketching

More information

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices Chapter 7 Iterative methods for large sparse linear systems In this chapter we revisit the problem of solving linear systems of equations, but now in the context of large sparse systems. The price to pay

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M) RODICA D. COSTIN. Singular Value Decomposition.1. Rectangular matrices. For rectangular matrices M the notions of eigenvalue/vector cannot be defined. However, the products MM and/or M M (which are square,

More information

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition) Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

More information

Definition 3 A Hamel basis (often just called a basis) of a vector space X is a linearly independent set of vectors in X that spans X.

Definition 3 A Hamel basis (often just called a basis) of a vector space X is a linearly independent set of vectors in X that spans X. Economics 04 Summer/Fall 011 Lecture 8 Wednesday August 3, 011 Chapter 3. Linear Algebra Section 3.1. Bases Definition 1 Let X be a vector space over a field F. A linear combination of x 1,..., x n X is

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

The Three Possibilities Summary No solution Infinitely many solutions Unique solution Existence of Infinitely Many Solutions More unknowns than

The Three Possibilities Summary No solution Infinitely many solutions Unique solution Existence of Infinitely Many Solutions More unknowns than The Three Possibilities Summary No solution Infinitely many solutions Unique solution Existence of Infinitely Many Solutions More unknowns than equations Missing variable Zero equations Example: Three

More information

r/lt.i Ml s." ifcr ' W ATI II. The fnncrnl.icniccs of Mr*. John We mil uppn our tcpiiblicnn rcprc Died.

r/lt.i Ml s. ifcr ' W ATI II. The fnncrnl.icniccs of Mr*. John We mil uppn our tcpiiblicnn rcprc Died. $ / / - (\ \ - ) # -/ ( - ( [ & - - - - \ - - ( - - - - & - ( ( / - ( \) Q & - - { Q ( - & - ( & q \ ( - ) Q - - # & - - - & - - - $ - 6 - & # - - - & -- - - - & 9 & q - / \ / - - - -)- - ( - - 9 - - -

More information

The PANORAMA. Numerical Ranges in. Modern Times. Man-Duen Choi 蔡文端. For the 12 th WONRA.

The PANORAMA. Numerical Ranges in. Modern Times. Man-Duen Choi 蔡文端. For the 12 th WONRA. The PANORAMA of Numerical Ranges in Modern Times Man-Duen Choi 蔡文端 choi@math.toronto.edu For the 12 th WONRA In the last century, Toeplitz-Hausdorff Theorem, Ando-Arveson Dilation Theorem, In the 21 st

More information

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016 Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016 From Applied to Pure Mathematics Algebraic and Spectral Graph Theory Sparsification: approximating

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Online Principal Components Analysis

Online Principal Components Analysis Online Principal Components Analysis Christos Boutsidis Dan Garber Zohar Karnin Edo Liberty Abstract We consider the online version of the well nown Principal Component Analysis PCA problem. In standard

More information

11 Heavy Hitters Streaming Majority

11 Heavy Hitters Streaming Majority 11 Heavy Hitters A core mining problem is to find items that occur more than one would expect. These may be called outliers, anomalies, or other terms. Statistical models can be layered on top of or underneath

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Problem 1. CS205 Homework #2 Solutions. Solution

Problem 1. CS205 Homework #2 Solutions. Solution CS205 Homework #2 s Problem 1 [Heath 3.29, page 152] Let v be a nonzero n-vector. The hyperplane normal to v is the (n-1)-dimensional subspace of all vectors z such that v T z = 0. A reflector is a linear

More information

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

New Directions in Computer Science

New Directions in Computer Science New Directions in Computer Science John Hopcroft Cornell University Time of change The information age is a revolution that is changing all aspects of our lives. Those individuals, institutions, and nations

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Review and problem list for Applied Math I

Review and problem list for Applied Math I Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information

Community detection in stochastic block models via spectral methods

Community detection in stochastic block models via spectral methods Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

dimensionality reduction for k-means and low rank approximation

dimensionality reduction for k-means and low rank approximation dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques

More information

Let's contemplate a continuous-time limit of the Bernoulli process:

Let's contemplate a continuous-time limit of the Bernoulli process: Mathematical Foundations of Markov Chains Thursday, September 17, 2015 2:04 PM Reading: Lawler Ch. 1 Homework 1 due Friday, October 2 at 5 PM. Office hours today are moved to 6-7 PM. Let's revisit the

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

On queueing in coded networks queue size follows degrees of freedom

On queueing in coded networks queue size follows degrees of freedom On queueing in coded networks queue size follows degrees of freedom Jay Kumar Sundararajan, Devavrat Shah, Muriel Médard Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression

More information

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan Estimating Dominance Norms of Multiple Data Streams Graham Cormode graham@dimacs.rutgers.edu Joint work with S. Muthukrishnan Data Stream Phenomenon Data is being produced faster than our ability to process

More information

Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points

Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points Michael Griebel Christian Rieger Peter Zaspel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

4 Frequent Directions

4 Frequent Directions 4 Frequent Directions Edo Liberty[3] discovered a strong connection between matrix sketching and frequent items problems. In FREQUENTITEMS problem, we are given a stream S = hs 1,s 2,...,s n i of n items

More information

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b)

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b) ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE : BASICS KRISTOFFER P. NIMARK. Bayes Rule De nition. Bayes Rule. The probability of event A occurring conditional on the event B having

More information

BANACH SPACES WITH THE 2-SUMMING PROPERTY. A. Arias, T. Figiel, W. B. Johnson and G. Schechtman

BANACH SPACES WITH THE 2-SUMMING PROPERTY. A. Arias, T. Figiel, W. B. Johnson and G. Schechtman BANACH SPACES WITH THE 2-SUMMING PROPERTY A. Arias, T. Figiel, W. B. Johnson and G. Schechtman Abstract. A Banach space X has the 2-summing property if the norm of every linear operator from X to a Hilbert

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra. QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview

More information

Manifold Coarse Graining for Online Semi-supervised Learning

Manifold Coarse Graining for Online Semi-supervised Learning for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

The Nyström Extension and Spectral Methods in Learning

The Nyström Extension and Spectral Methods in Learning Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali

More information

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am ECS130 Scientific Computing Lecture 1: Introduction Monday, January 7, 10:00 10:50 am About Course: ECS130 Scientific Computing Professor: Zhaojun Bai Webpage: http://web.cs.ucdavis.edu/~bai/ecs130/ Today

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Statistical Machine Learning, Part I. Regression 2

Statistical Machine Learning, Part I. Regression 2 Statistical Machine Learning, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp SML-2015 1 Last Week Regression: highlight a functional relationship between a predicted variable and predictors SML-2015 2 Last

More information

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Jianhui Chen 1, Tianbao Yang 2, Qihang Lin 2, Lijun Zhang 3, and Yi Chang 4 July 18, 2016 Yahoo Research 1, The

More information