Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Similar documents
Lecture Notes 1: Vector spaces

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

INNER PRODUCT SPACE. Definition 1

Linear Algebra Massoud Malek

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

Descriptive Statistics

Linear Algebra. Session 12

Section 7.5 Inner Product Spaces

DS-GA 1002 Lecture notes 10 November 23, Linear models

Linear Analysis Lecture 5

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections

Lecture Notes 2: Matrices

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

y 2 . = x 1y 1 + x 2 y x + + x n y n 2 7 = 1(2) + 3(7) 5(4) = 3. x x = x x x2 n.

Mathematics Department Stanford University Math 61CM/DM Inner products

Math Linear Algebra

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

2. Review of Linear Algebra

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Inner product spaces. Layers of structure:

Chapter 2. Vectors and Vector Spaces

Algebra II. Paulius Drungilas and Jonas Jankauskas

Introduction to Signal Spaces

There are two things that are particularly nice about the first basis

Applied Linear Algebra in Geoscience Using MATLAB

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Projection Theorem 1

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

Introduction to Linear Algebra, Second Edition, Serge Lange

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition

The Hilbert Space of Random Variables

MATH Linear Algebra

Chapter 3 Transformations

Lecture 1: Review of linear algebra

Math Linear Algebra II. 1. Inner Products and Norms

Math 3191 Applied Linear Algebra

Typical Problem: Compute.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Math 290, Midterm II-key

The following definition is fundamental.

6 Inner Product Spaces

SUMMARY OF MATH 1600

MIDTERM I LINEAR ALGEBRA. Friday February 16, Name PRACTICE EXAM SOLUTIONS

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Chapter 6: Orthogonality

Lecture 2: Linear Algebra Review

Lecture 3: Review of Linear Algebra

Lecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why?

October 25, 2013 INNER PRODUCT SPACES

Lecture 3: Review of Linear Algebra

Mathematical Methods wk 1: Vectors

Mathematical Methods wk 1: Vectors

The 'linear algebra way' of talking about "angle" and "similarity" between two vectors is called "inner product". We'll define this next.

Vectors. Vectors and the scalar multiplication and vector addition operations:

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Chapter 6 Inner product spaces

ORTHOGONALITY AND LEAST-SQUARES [CHAP. 6]

Review (Probability & Linear Algebra)

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Part 1a: Inner product, Orthogonality, Vector/Matrix norm

LINEAR ALGEBRA SUMMARY SHEET.

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Numerical Linear Algebra

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Fourier and Wavelet Signal Processing

1. General Vector Spaces

Functional Analysis Exercise Class

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Supplementary Notes on Linear Algebra

Elementary linear algebra

Chapter 4 Euclid Space

Homework 11 Solutions. Math 110, Fall 2013.

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

The Singular-Value Decomposition

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Orthogonality and Least Squares

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

MATH 304 Linear Algebra Lecture 23: Diagonalization. Review for Test 2.

v = v 1 2 +v 2 2. Two successive applications of this idea give the length of the vector v R 3 :

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

PRACTICE PROBLEMS FOR THE FINAL

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

Knowledge Discovery and Data Mining 1 (VO) ( )

The Gram Schmidt Process

The Gram Schmidt Process

6. Orthogonality and Least-Squares

LINEAR ALGEBRA W W L CHEN

Chapter 2 Linear Transformations

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

MATH 235. Final ANSWERS May 5, 2015

MATH 22A: LINEAR ALGEBRA Chapter 4

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

CHAPTER VIII HILBERT SPACES

Transcription:

Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda

Vector space Consists of: A set V A scalar field (usually R or C) Two operations + and

Properties For any x, y V, x + y belongs to V For any x V and any scalar α, α x V There exists a zero vector 0 such that x + 0 = x for any x V For any x V there exists an additive inverse y such that x + y = 0, usually denoted by x

Properties The vector sum is commutative and associative, i.e. for all x, y, z V x + y = y + x, ( x + y) + z = x + ( y + z) Scalar multiplication is associative, for any scalars α and β and any x V α (β x) = (α β) x Scalar and vector sums are both distributive, i.e. for any scalars α and β and any x, y V (α + β) x = α x + β x, α ( x + y) = α x + α y

Subspaces A subspace of a vector space V is any subset of V that is also itself a vector space

Linear dependence/independence A set of m vectors x 1, x 2,..., x m is linearly dependent if there exist m scalar coefficients α 1, α 2,..., α m which are not all equal to zero and m α i x i = 0 i=1 Equivalently, any vector in a linearly dependent set can be expressed as a linear combination of the rest

Span The span of { x 1,..., x m } is the set of all possible linear combinations span ( x 1,..., x m ) := { y y = } m α i x i for some scalars α 1, α 2,..., α m i=1 The span of any set of vectors in V is a subspace of V

Basis and dimension A basis of a vector space V is a set of independent vectors { x 1,..., x m } such that V = span ( x 1,..., x m ) If V has a basis with finite cardinality then every basis contains the same number of vectors The dimension dim (V) of V is the cardinality of any of its bases Equivalently, the dimension is the number of linearly independent vectors that span V

Standard basis e 1 = 1 0. 0, e 2 = 0 1. 0,..., e n = 0 0. 1 The dimension of R n is n

Inner product Operation, that maps a pair of vectors to a scalar

Properties If the scalar field is R, it is symmetric. For any x, y V x, y = y, x If the scalar field is C, then for any x, y V x, y = y, x, where for any α C α is the complex conjugate of α

Properties It is linear in the first argument, i.e. for any α R and any x, y, z V α x, y = α x, y, x + y, z = x, z + y, z. If the scalar field is R, it is also linear in the second argument It is positive definite: x, x is nonnegative for all x V and if x, x = 0 then x = 0

Dot product Inner product between x, y R n x y := i x [i] y [i] R n endowed with the dot product is usually called a Euclidean space of dimension n If x, y C n x y := i x [i] y [i]

Sample covariance Quantifies joint fluctuations of two quantities or features For a data set (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) cov ((x 1, y 1),..., (x n, y n)) := 1 n 1 where the average or sample mean is defined by n (x i av (x 1,..., x n)) (y i av (y 1,..., y n)) i=1 av (a 1,..., a n) := 1 n If (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) are iid samples from x and y E (cov ((x 1, y 1 ),..., (x n, y n ))) = Cov (x, y) := E ((x E (x)) (y E (y))) n i=1 a i

Matrix inner product The inner product between two m n matrices A and B is ( ) A, B := tr A T B m n = A ij B ij i=1 j=1 where the trace of an n n matrix is defined as the sum of its diagonal tr (M) := n i=1 M ii For any pair of m n matrices A and B ( ) tr B T A := tr (AB ) T

Function inner product The inner product between two complex-valued square-integrable functions f, g defined in an interval [a, b] of the real line is f g := b a f (x) g (x) dx

Norm Let V be a vector space, a norm is a function from V to R with the following properties It is homogeneous. For any scalar α and any x V α x = α x. It satisfies the triangle inequality x + y x + y. In particular, x 0 x = 0 implies x = 0

Inner-product norm Square root of inner product of vector with itself x, := x, x

Inner-product norm Vectors in R n or C n : l 2 norm x 2 := x x = n x[i] 2 i=1 Matrices in R m n or C m n : Frobenius norm m A F := tr (A T A) = n i=1 j=1 Square-integrable complex-valued functions: L 2 norm b A 2 ij f L2 := f, f = a f (x) 2 dx

Cauchy-Schwarz inequality For any two vectors x and y in an inner-product space Assume x, 0, then x, y x, y, x, y = x, y, y = y, x, x x, y = x, y, y = y, x, x

Sample variance and standard deviation The sample variance quantifies fluctuations around the average var (x 1, x 2,..., x n ) := 1 n 1 n (x i av (x 1, x 2,..., x n )) 2 If x 1, x 2,..., x n are iid samples from x E (var (x 1, x 2,..., x n )) = Var (x) := E ((x E (x)) 2) i=1 The sample standard deviation is std (x 1, x 2,..., x n ) := var (x 1, x 2,..., x n )

Correlation coefficient Normalized covariance ρ (x1,y 1 ),...,(x n,y n) := cov ((x 1, y 1 ),..., (x n, y n )) std (x 1,..., x n ) std (y 1,..., y n ) Corollary of Cauchy-Schwarz and 1 ρ (x1,y 1 ),...,(x n,y n) 1 ρ x, y = 1 y i = av (y 1,..., y n ) std (y 1,..., y n ) std (x 1,..., x n ) (x i av (x 1,..., x n )) ρ x, y = 1 y i = av (y 1,..., y n ) + std (y 1,..., y n ) std (x 1,..., x n ) (x i av (x 1,..., x n ))

Correlation coefficient ρ x, y 0.50 0.90 0.99 ρ x, y 0.00-0.90-0.99

Temperature data Temperature in Oxford over 150 years Feature 1: Temperature in January Feature 1: Temperature in August 20 18 16 ρ = 0.269 April 14 12 10 8 16 18 20 22 24 26 28 August

Temperature data Temperature in Oxford over 150 years (monthly) Feature 1: Maximum temperature Feature 1: Minimum temperature 20 ρ = 0.962 Minimum temperature 15 10 5 0 5 10 5 0 5 10 15 20 25 30 Maximum temperature

Parallelogram law A norm on a vector space V is an inner-product norm if and only if for any x, y V 2 x 2 + 2 y 2 = x y 2 + x + y 2

l 1 and l norms Norms in R n or C n not induced by an inner product x 1 := n x[i] i=1 x := max x[i] i Hölder s inequality x, y x 1 y

Norm balls l 1 l 2 l

Distance The distance between two vectors x and y induced by a norm is d ( x, y) := x y

Classification Aim: Assign a signal to one of k predefined classes Training data: n pairs of signals (represented as vectors) and labels: { x 1, l 1 },..., { x n, l n }

Nearest-neighbor classification nearest neighbor

Face recognition Training set: 360 64 64 images from 40 different subjects (9 each) Test set: 1 new image from each subject We model each image as a vector in R 4096 and use the l 2 -norm distance

Face recognition Training set

Nearest-neighbor classification Errors: 4 / 40 Test image Closest image

Orthogonality Two vectors x and y are orthogonal if and only if x, y = 0 A vector x is orthogonal to a set S, if x, s = 0, for all s S Two sets of S 1, S 2 are orthogonal if for any x S 1, y S 2 x, y = 0 The orthogonal complement of a subspace S is S := { x x, y = 0 for all y S}

Pythagorean theorem If x and y are orthogonal x + y 2, = x 2, + y 2,

Orthonormal basis Basis of mutually orthogonal vectors with inner-product norm equal to one If { u 1,..., u n } is an orthonormal basis of a vector space V, for any x V x = n u i, x u i i=1

Gram-Schmidt Builds orthonormal basis from a set of linearly independent vectors x 1,..., x m in R n 1. Set u 1 := x 1 / x 1 2 2. For i = 1,..., m, compute i 1 v i := x i u j, x i u j and set u i := v i / v i 2 j=1

Direct sum For any subspaces S 1, S 2 such that S 1 S 2 = {0} the direct sum is defined as S 1 S 2 := { x x = s 1 + s 2 s 1 S 1, s 2 S 2 } Any vector x S 1 S 2 has a unique representation x = s 1 + s 2 s 1 S 1, s 2 S 2

Orthogonal projection The orthogonal projection of x onto a subspace S is a vector denoted by P S x such that x P S x S The orthogonal projection is unique

Orthogonal projection

Orthogonal projection Any vector x can be decomposed into x = P S x + P S x. For any orthonormal basis b 1,..., b m of S, P S x = m i=1 x, b i bi The orthogonal projection is a linear operation. For x and y P S ( x + y) = P S x + P S y

Dimension of orthogonal complement Let V be a finite-dimensional vector space, for any subspace S V ( dim (S) + dim S ) = dim (V)

Orthogonal projection is closest The orthogonal projection P S x of a vector x onto a subspace S is the solution to the optimization problem minimize u subject to x u, u S

Proof Take any point s S such that s P S x x s 2,

Proof Take any point s S such that s P S x x s 2, = x P S x + P S x s 2,

Proof Take any point s S such that s P S x x s 2, = x P S x + P S x s 2, = x P S x 2, + P S x s 2,

Proof Take any point s S such that s P S x x s 2, = x P S x + P S x s 2, = x P S x 2, + P S x s 2, > x P S x 2, if s P S x

Denoising Aim: Estimating a signal from perturbed measurements If the noise is additive, the data are modeled as the sum of the signal x and a perturbation z y := x + z The goal is to estimate x from y Assumptions about the signal and noise structure are necessary

Denoising via orthogonal projection Assumption: Signal is well approximated as belonging to a predefined subspace S Estimate: P S y, orthogonal projection of the noisy data onto S Error: x P S y 2 2 = P S x 2 2 + P S z 2 2

Proof x P S y

Proof x P S y = x P S x P S z

Proof x P S y = x P S x P S z = P S x P S z

Error P S z P S y S P S x error 0 x z y

Face denoising Training set: 360 64 64 images from 40 different subjects (9 each) Noise: iid Gaussian noise SNR := x 2 z 2 = 6.67 We model each image as a vector in R 4096

Face denoising We denoise by projecting onto: S 1 : the span of the 9 images from the same subject S 2 : the span of the 360 images in the training set Test error: x P S1 y 2 x 2 = 0.114 x P S2 y 2 x 2 = 0.078

S 1 S 1 := span ( )

Denoising via projection onto S 1 Projection onto S 1 Projection onto S 1 Signal x = 0.993 + 0.114 + Noise z = 0.007 + 0.150 = Data y = + Estimate

S 2 S 2 := span ( )

Denoising via projection onto S 2 Projection onto S 2 Projection onto S 2 Signal x = 0.998 + 0.063 + Noise z = 0.043 + 0.144 = Data y = + Estimate

P S1 z and P S2 z P S1 z P S2 z 0.007 = P S 1 z 2 x 2 < P S 2 z 2 x 2 = 0.043 0.043 0.007 = 6.14 dim (S 2 ) dim (S 1 ) (not a coincidence)

P S 1 x and P S 2 x P S 1 x P S 2 x 0.063 = P S x 2 P S x 2 1 < 2 = 0.190 x 2 x 2

P S1 y and P S2 y x P S1 y P S2 y