Representation of objects. Representation of objects. Transformations. Hierarchy of spaces. Dimensionality reduction

Representation of objects Representation of objects For searching/mining, first need to represent the objects Images/videos: MPEG features Graphs: Matrix Text document: tf/idf vector, bag of words Kernels How to compare objects Distance measure Similarity measure Vector/metric spaces Composition of spaces Variety of data Dynamic data 2 Transformations Hierarchy of spaces Dimensionality reduction Fourier, wavelet, SVD,.. Embedding of objects into Euclidean spaces Typically from metric spaces Vector space Metric space Normed linear space Inner product space E.g., spectral analysis of graphs R n 3 4

Vector spaces Vector addition Commutative, associative, identity, inverse Scalar multiplication (allow complex numbers) Associative: (cd)x = c(dx) Identity: x = x Distributive: (c + d)x = cx + dx c(x+y) = cx + cy Subspace Closed under vector addition and scalar multiplication References http://mathworld.wolfram.com/vectorspace.html http://www.math.ohio-state.edu/~gerlach/math/bvtypset/bvtypset.html Linear algebra and its applications, G. Strang, Brooks/Cole, 3 rd edition Definitions Vectors v, v 2,.., v k are linearly independent iff c v + c 2 v 2 +..+ c k v k = 0 only when c = c 2 =..= c k = 0. Given a set of vectors v, v 2,.., v k, their span is the vector space generated by their linear combinations. A basis for a vector space V is a set of vectors that are linearly independent and that spans V. The cardinality of the basis of a vector space is called its dimension. Bases can be different but they all have the same dimension 5 6 Normed linear (vector) space The norm of a vector is a measure of its size, denoted v. It has the following properties: v 0, v = 0 iff v = 0 cv = c. v Note: c is the magnitude of scalar c. u+v u + v, triangle inequality A normed linear space is a vector space with a norm. Examples Vector space of matrices with A = maximum absolute value of elements of A Vector space of infinite sequences satisfying the convergence condition (Σ x k p ) /p <, p 7 Inner product space Can be used to compare objects An inner product <u,v> is a binary operator from complex (real) vectors to a complex (real) scalar such that <u,u> is real and at least 0 <u,u> = 0 iff u = 0 <u,v> = <v,u> <bu+cv, w> = b<u,w> + c<v,w> An inner product space is vector space with an inner product. Inner product defines a norm = <v,v> = v. Satisfies the three conditions on previous slide (proof follows). Also satisfies Schwarz inequality: <u,v> u v (proof follows). 8 2

Proof of Schwarz inequality 0 <u-xv, u-xv> = <u,u-xv> - x<v,u-xv> = <u-xv,u> - x<u-xv,v> = <u,u> - x<v,u> - x (<u,v> - x<v,v>) = <u,u> - x<u,v> - x<u,v> + x 2 <v,v> Choose x = <u,v> / <v,v> Then, 0 <u,u> - <u,v> 2 / <v,v> Or, <u,v> u v Proof of triangle inequality u+v 2 = u 2 + v 2 + <u,v> + <u,v> = u 2 + v 2 + 2 Re <u,v> u 2 + v 2 + 2 <u,v> u 2 + v 2 + 2 u v, Schwarz ineq = ( u + v ) 2 Note that if <u,v> = 0 then u+v 2 = u 2 + v 2, Pythagorus theorem u-v 2 = u 2 + v 2, defn of u+v, complex numbers, complex numbers 9 0 Metric space A metric space comes with a non-negative metric function d such that d(x,x) = 0 d(x,y) = d(y,x) d(x,y) d(x,z) + d(z,y), triangle inequality Every normed linear space is a metric space with d(x,y) = x-y. Every inner product space is a normed linear space. Inner product space is also a metric space Complete spaces A sequence f, f 2,.. converges to the limit f provided for any ε, there exists N such that for all n > N, d(f n,f) < ε. A sequence is called a Cauchy sequence provided for any ε, there exists N such that for all m,n > N, d(f m,f n ) < ε. Every convergent sequence is a Cauchy sequence. d(f m,f n ) d(f m,f) + d(f n,f) The limit of a Cauchy sequence may not exist in the metric space Rational numbers A complete metric space is a metric space for which the limit of every Cauchy sequence lies in the metric space. Similar completeness: Inner product space Hilbert space Normed linear space Banach space 2 3

Real vector space: R n n basis vectors Inner product <u,v> is the dot product = u T v. Length of a vector v = v T v = v Angle θ between two vectors u and v is given by cos θ = <u,v> / u v u-v 2 = u 2 + v 2 2 u v cos θ, property of triangles u-v 2 = (u-v) T (u-v) = u 2 + v 2 2 u T v Schwarz inequality: u T v u v u T v u p v q provided /p + /q = and < p,q < (Holder s inequality) L p (Minkowski) distance metrics: x = (Σ x k p ) /p is a norm for p. p-mean: ((Σ x k p )/n) /p, n is the size of the vector p = -: Harmonic mean; p = 0: Geometric mean; p = : Arith mean; p = 2: Euclidean mean; p = : max; p = - : min 3 Generalized p-means b Ha + blê2 ab a log a (Generalization of AM-GM inequality) Let x i > 0, λ i in [0,], Σ λ i =. Then Π x i λi Σ λ i x i. Proof follows from convexity of exponential function. 2 y á e x Hlog a + log bl log b Figure 4: A visual proof that p ab < (a + b)/2. 4 Generalized p-means Properties of Minkowski norms Let x i > 0, λ i in [0,], Σ λ i =. If p q then (Σ λ i x ip ) /p (Σ λ i x iq ) /q for all non-zero p,q. If p = and q > then the inequality becomes (Σ λ i x i ) q (Σ λ i x iq ) and the proof follows from convexity of x q. If 0 < p < q then consider the fraction q/p in the above inequality. Similar analysis for other cases. (a) k= (b) k=2 (c) k=3 (d) k=inf Figure: Illustration of Minkowski distance For p q <, x p x q. Proof: x p x q Note the difference = (Σ x k p ) /p (Σ x k q ) /q with p-means = (Σ x k p ) q/p Σ ( x k p ) q/p Therefore, it suffices to show that (Σ u k ) r Σ u r k, r, u k 0, which follows from the expansion of LHS. 5 6 4

Properties of Minkowski norms If p then x p satisfies triangle inequality. Need to show that u+v p u p + v p Since x p is convex, (-λ)u + λv p p = Σ (-λ)u i + λv i p Σ (-λ) u i p + λ v i p = (-λ) u p p + λ v pp. So, for unit vectors u and v, (-λ)u + λv p. u+v p / ( u p + v p ) = u / ( u p + v p ) + v / ( u p + v p ) p = u p / ( u p + v p ) (u / u p ) + v p / ( u p + v p ) (v / v p ) p. 7 Other spaces of interest l p = set of all infinite sequences of real (or complex) numbers such that their p-norm is finite. Extension from finite vector spaces to countably infinite dimensions. These spaces are nested, i.e., l l 2 l 3... l L p = set of functions whose pth powers are integrable: p f : X R such that f dµ <. and the p-norm of the function is defined as ( f The norm for l p generalizes the finite p-norm while the norm for L p generalizes the p-mean: for p q <, f p f q. Each L p is complete. p dµ) /p 8 Metric for color spaces Color similarity using a real, symmetric, positive semi-definite matrix A (no negative eigenvalues) Matrix A considers cross-talk among color bins d(u,v) = ((u-v) T A(u-v) ) Quadratic form distance Example: Three bins: red, orange, blue Vectors u(,0,0), v(0,,0), w(0,0,) Matrix A = 0.8 0 0.8 0 0 0 Compute the distances 9 Is d(u,v) a metric? x T Ax = Σ λ i y 2 orthonormal eigenvectors, real eigenvalues i A = Σ λ i q i q T i /* spectral decomposition for real and symmetric A*/ λ i is the i th eigenvalue and q i is the corresponding normalized eigenvector y i = x T q i = length of projection of x along q i d(u,v) = ((u-v) T A(u-v) ) = (Σ λ i (u i -v i ) 2 ) d(u,v) 0, d(u,u) = 0, d(u,v) = d(v,u) d(u,w) d(u,v) + d(v,w), provided (Σ λ i (u i -w i ) 2 ) (Σ λ i (u i -v i ) 2 ) + (Σ λ i (v i -w i ) 2 ) (Σ ( λ i u i - λ i w i ) 2 ) (Σ ( λ i u i - λ i v i ) 2 ) + (Σ( λ i v i - λ i w i ) 2 ) This is the triangle inequality for L 2 norm 20 5

Predicting class membership: Normalized Euclidean distance Given point x(x,,x k ) and a set of points with centroid c(c,,c k ): Normalize in each dimension: y i = (x i -c i )/σ i Distance of point x from set = y: y 2 = Σ ((x i -c i )/σ i ) 2 = (x-c) T S(x-c) S is a diagonal matrix with entries of the form /σ i 2 Points with the same y-value lie on an axesoriented ellipsoid centered at c. 2 Predicting class membership: Mahalanobis distance Distance of point x from set = (x-c) T S - (x-c) S is the covariance matrix: real, symmetric, and positive semi-definite (its inverse also has these properties) S(i,j) = E [(Z i µ i )(Z j µ j )] S(i,i) = σ 2 i = E [(Zi µ i )(Z i µ i )] If each data is placed in a row then the covariance of dimensions i and j = (Z i µ i ) T (Z j µ j ) / n, n is the number of data items or rows. As compared to normalized Euclidean, Mahalanobis distance allows the axes of the ellipsoid to rotate. The eigenvectors determine the axes of the ellipsoid. 22 Orthonormal basis A basis for an inner product space is orthonormal provided <f,g> = 0 for any pair of distinct vectors f and g in the basis. <f,f > = Only the first condition is needed for an orthogonal basis. Orthogonal matrix: square matrix whose columns are orthonormal. Gram-Schmidt orthogonalization A set of linearly independent vectors {u i } can be transformed into a set {w i } of orthonormal vectors. v = u, w = v / v v 2 = (u 2 - <u 2,w > w ), w 2 = v 2 / v 2 k- v k = (u k - <u k,w i > w i ), w k = v k / v k i = 23 24 6

Properties of orthogonal matrices Q T Q = I (by definition) Multiplication by an orthogonal matrix preserves length Qx = x Proof: Qx 2 = (Qx) T (Qx) = x T Q T Qx = x T Ix = x T x = x 2 Multiplication by an orthogonal matrix preserves inner products and angles (Qx) T (Qy) = x T Q T Qy = x T Iy = x T y cos θ = <x,y> / x y Hilbert space Complete Inner Product Space Finite dimensions R n with inner product as the dot product of u and v C n with inner product as the dot product of u and the conjugate of v Infinite dimensions L 2 (R) denotes the collection of all measurable functions f s.t. f(x) 2 dx <. - Space of square-integrable functions: vector is a function here. Can be generalized to L p (R). Define <f,g> = f(x)g(x) dx L 2 (0,2π) denotes the collection of all measurable functions f defined over the interval (0,2π) s.t. 2π f(x) 2 dx <. 0 Space of 2π-periodic square-integrable functions 25 26 Properties of Hilbert Spaces Given a Hilbert space with orthonormal basis {e i } (Fourier expansion): v Σ <v,e i > e i (Plancherel s theorem): <v,w> = Σ <v,e i > <w,e i > (Parseval s theorem): <v,v> = v 2 = <v,e i > 2 = <v,e i > e i 2 Dimensionality reduction Reduce the number of dimensions of data Reduced storage and computation Focus on the main trends Project the d-dimensional points in a k-dimensional space so that: k << d distances are preserved as well as possible Example: use the first few Fourier coefficients 27 28 7

Embeddings Given a distance d, embed the objects into a space of smaller dimensions using a mapping F and a distance d such that d(i,j) is close to d (F(i),F(j)) Isometric mapping: exact preservation of distance Contractive mapping: d (F(i),F(j)) d(i,j) NN and range queries in reduced space Examples d(a,b) = d(a,c) = d(b,c) = 2, d(a,e) = d(b,e) = d(c,e) =. Isometric embedding in a 3-d space using L 2? Isometric embedding in a 3-d space using L? Isometric embedding of n points in n- dimensional space using L F(i) = distance vector of object i 29 30 Fourier analysis Analysis of functions or signals in frequency space Continuous aperiodic signal: Fourier transform Continuous periodic signal: Fourier series Discrete periodic signal: Discrete Fourier transform Orthonormal basis functions and inner products in each case Complex numbers possible as coefficients even for functions in real space Inner product based on summation/integration of function with basis elements. 3 Fourier transform The set {(/ 2π)e j2πkx k є R} forms an orthonormal set of basis functions for L 2 (R). The resulting decomposition is called the Fourier transform. F(k) = / 2π e -j2πkx f(x) dx f(x) = / 2π e j2πkx F(k) dk Sometimes, 2πk is replaced by ω (angular frequency) and x by t (time). F(ω) = / 2π e -jωx f(t) dt f(t) = / 2π e jωx F(ω) dω forward inverse 32 8

Fourier series Orthonormal basis functions (/ 2π), (/ π) cos x, (/ π) sin x, (/ π) cos 2x, (/ π) sin 2x, For any function f(x) є L 2 (0,2π), f(x) = a 0 (/ 2π) + a (/ π) cos x + b (/ π) sin x + a 2 (/ π) cos2x + b 2 (/ π) sin2x + Fourier series f(x) = /2π f(x) dx + /π f(x) cos x dx + /π f(x) sin x dx + /π f(x) cos 2x dx + /π f(x) sin 2x dx + a 0 = <f, / 2π > = / 2π f(x) dx a m = <f, (/ π) cos mx> = / π f(x)cos(mx) dx b m = <f, (/ π) sin mx> = / π f(x)sin(mx) dx 33 34 Basis elements for Fourier series Fourier approximation 35 36 9

Fourier approximation Frequency analysis The independent variable is Time The dependent variable is the Amplitude Most of the information is hidden in the Frequency content 2 Hz 20 Hz Magnitude Magnitude 0.5 0-0.5-0 0.5 0.5 0-0.5 Time Magnitude Magnitude 0.5 0-0.5-0 0.5 4 2 0-2 Time 0 Hz 2 Hz + 0 Hz + 20Hz 37-0 0.5 Time -4 0 0.5 Time 38 Discrete Fourier Transform (DFT) Given an input sequence (x 0,x,x 2, x ), transform it into frequency domain so that we have equality at n points. Only need n basis elements Orthonormal basis functions ω is the nth primitive root of unity = e 2πj/n = cos (2π/n) + jsin(2π/n)) (/ n) [,,..] T (/ n) [, ω, ω 2,, ω () ] T (/ n) [, ω 2, ω 4,, ω 2() ] T (/ n) [, ω (), ω 2(),, ω ()2 ] T Proof of orthonormality Remember to take the conjugate Illustrate on unit circle Example: vector (2,4,6,8) / 4[c 0 + c e jx + c 2 e j2x + c 3 e j3x ]= (2,4,6,8) at (0,π/2,π,3π/2) Discrete Fourier Transform Given input x = (x 0,x,x 2, x ), DFT produces X= (X 0,X,X 2, X ) where X f (projection along the f th basis element) is given by X f = / n x t ω -ft = / n x t e -j2πft/n t=0 for f = 0,,, X 0 = / n x t t=0 X = / n x t e -j2πt/n = / n x t (cos (2πt/n) - jsin(2πt/n)) t=0 X 2 = / n x t e -j4πt/n = / n x t (cos (4πt/n) - jsin(4πt/n)) t=0 39 40 0

Inverse Transform Given X= (X 0,X,X 2, X ), the inverse transform produces x= (x 0,x,x 2, x ) where x t is given by x t = / n X f ω ft = / n X f e j2πft/n, t = 0,,, f=0 x 0 = / n X f f=0 x = / n X f e j2πf/n = / n X f (cos (2πf/n) + jsin(2πf/n)) f=0 x 2 = / n X f e j4πf/n = / n X f (cos (4πf/n) + jsin(4πf/n)) f=0 Fourier matrix and its inverse F jk = (/ n) ω jk ; inner product with this matrix defines the forward transform (analysis) F - jk = (/ n) ω -jk ; inner product with this matrix defines the inverse transform (synthesis) Prove that FF - = I 4 42 Properties Parseval s theorem: energy in time domain = x 2 = x t 2 = energy in frequency domain = X f 2 x-y 2 = X-Y 2 = X f -Y f 2 Convolution in time domain is multiplication in frequency domain. FFT Time complexity O(n log n) How to build an index? Which coefficients? Searching in a reduced dimensional space 43