Indexing of Variable Length Multi-attribute Motion Data

Size: px
Start display at page:

Download "Indexing of Variable Length Multi-attribute Motion Data"

Transcription

1 Indexing of Variable Length Multi-attribute Motion Data Chuanjun Li Gaurav Pradhan S.Q. Zheng B. Prabhakaran Department of Computer Science University of Texas at Dallas, Richardson, TX 7583 {chuanjun, gnp21, sizheng, ABSTRACT Haptic data such as 3D motion capture data and sign language animation data are new forms of multimedia data. The motion data is multi-attribute, and indexing of multiattribute data is important for quickly pruning the majority of irrelevant motions in order to have real-time animation applications. Indexing of multi-attribute data has been attempted for data of a few attributes by using R-tree or its variants after dimensionality reduction. In this paper, we exploit the singular value decomposition (SVD) properties of multi-attribute motion data matrices to obtain one representative vector for each of the motion data matrices of dozens or hundreds of attributes. Based on this representative vector, we propose a simple and efficient interval-tree based index structure for indexing motion data with large amount of attributes. At each tree level, only one component of the query vector needs to be checked during searching, comparing to all the components of the query vector that should get involved if an R-tree or its variants are used for indexing. Searching time is independent of the number of pattern motions indexed by the tree, making the index structure well scalable to large data repositories. Experiments show that up to 91 93% irrelevant motions can be pruned for a query with no false dismissals, and the query searching time is less than 3 µs with the existence of motion variations. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Analysis and Indexing Indexing methods General Terms Content Algorithm Work supported partially by the National Science Foundation under Grant No for the project CAREER: Animation Databases. Any opinions, findings, and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the National Science Foundation. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MMDB 4, November 13, 24, Washington, DC, USA. Copyright 24 ACM /4/11...$5.. Keywords indexing, dimensionality reduction, similarity, singular value decomposition, multi-attribute motion. 1. INTRODUCTION Haptic data such as 3D motion capture data and sign language animation data are new forms of multimedia data. For instance, a gesture sensing device such as CyberGlove with dozens of sensors generates multi-attribute data when a person wearing the glove moves his/her hand to create sign languages for hearing impaired. When a person wearing the glove moves his/her hand, each sensor has one reading at one sampling time, and the resulting data corresponding to one sign (e.g., sign language for TV or Milk) forms a matrix of multiple columns or attributes and hundreds or thousands of rows. Each row corresponds to 1 sampling of all the sensors. In the case of 3D motion capture data, a motion such as walking or running has to be captured by using dozens or even hundreds of sensors, and the 3D motion capture data can thus have hundreds of attributes corresponding to the sensors placed over different parts of a motion object, and have thousands of rows of readings. For many of these multimedia applications, more than one value is generated at each time, rather than only one value at each time as in a time series data sequence. As a result, multi-attribute data of one motion forms a matrix, rather than a multidimensional vector as in a time series sequence. For example, multi-attribute motion data used in this paper are matrices, and each of the matrices is for one motion and has 22 columns and hundreds or thousands of rows. The matrices of motion data can be of variable lengths, due to the facts that motions can be carried out with different speeds at different time, they can have different durations, and motion sampling rates may also be different. Hence there are no continuous row-to-row correspondences between data of similar motions as shown in Figure 1. For similar motions, corresponding attributes can have large differences in values between them, and different corresponding attribute pairs can have different variations at different times due to local speed differences. These properties of multi-attribute motion data make it difficult to index the data efficiently, and no indexing techniques so far can index multi-attribute motion data directly or efficiently. Indexing should help prune the majority of irrelevant data quickly for a query, and that s the purpose of our proposed indexing structure for multi-attribute motion data. 75

2 -2.5 Motion Data Values singular vectors of multi-attribute data provide the essential geometric structures of matrices of the multi-attribute motion data. Section 2 reviews related work, and section 4 generates indexing vectors from the first right singular vectors. Section 5 proposes an interval-based tree index structure, followed by section 6 which experimentally evaluates the pruning efficiency of the proposed indexing structure. Finally section 7 concludes the paper. Motion Data Values Motion Data Lengths Figure 1: Data for Two Similar Motions. Similar motion data can have different lengths, and different corresponding attribute pairs can have different variations at different times, hence there are no row-to-row correspondences between data of similar motions, making it difficult to index multi-attribute motion data. Proposed Approach: Assuming that all motions including the pattern motions to be indexed in a database and any query motions have the same number of attributes. One important observation can be made that the geometric structures are similar for similar motions. Based on this observation, we proposed to index multi-attribute matrix data according to the geometric structures of data matrices. Since Singular Value Decomposition (SVD) exposes the geometric structures of motion data matrices, as explored in Section 3, multi-attribute motion matrix data are decomposed by using SVD, and the resulting first right singular vectors are employed for indexing the corresponding matrices. The dimensionalities of the first right singular vectors are reduced by using SVD, due to their short equi-lengths. Based on the distribution of the reduced first singular vectors or their negations of similar motions, a simple and efficient interval-tree based index structure is proposed for indexing the reduced first singular vectors of the pattern motions in the database. After a query matrix has been transformed into a reduced singular vector, only one component of the reduced singular vector of the query needs to be checked at each tree level during searching, comparing to all the components of the query vector that get involved for an R-tree or its variants. Experiments show that up to 91 93% irrelevant motions can be pruned for a query, the query searching time can be less than 3 µs even if there might be reasonably large variations in similar motions, and the searching time is independent of the number of pattern motions in the database. The rest of the paper is organized as follows. Section 3 gives the background information about SVD for its use in the index structure to be proposed, and shows that first right 2. RELATED WORK For time series sequences, which are multidimensional vectors, multi-dimensional index structures such as R-tree [9] or its variants [2, 11] have been used to index the dimensionality reduced data. Dimensionality reduction techniques include Discrete Fourier Transform (DFT) [1, 7], discrete Haar Wavelet Transform (DWT) [5, 6, 16], and Piecewise Aggregate Approximation (APP) [12] and Adaptive Piecewise Constant Approximation (APCA) [4] as well as Singular Value Decomposition (SVD) [3, 4, 12]. Any signal, or time series data of length n can be decomposed into n sine/cosine wave superpositions, and many of the waves, each of which represented by a Fourier wave coefficient, have very low amplitude and contribute little to reconstructing the original signal. Discrete Fourier Transform (DFT) [1, 7] reduces the dimensionality of time series data by retaining only the first few coefficients. To use DWT to reduce dimensionality of time series data, Haar wavelet functions are used to represent data in terms of the sum and difference of a mother wavelet, and part of wavelet coefficients can be truncated to reduce the dimensionality. One drawback of DWT is that the data sequence length is required to be in integral power of two, which is obviously too strong for practical scenarios. APP [12] approximates the data by segmenting the sequences into equi-length sections and indexes the mean values of the sections in a lower dimensionality space, and APCA [4] approximates each time series by a set of constant value segments of varying lengths. To use SVD for dimensionality reduction, SVD of a matrix composing pattern time series data is computed for transforming a query series. The transformed query series is truncated with only a few components retained. Although it requires all sequences to have equal lengths and its computation is costly (if the time series are very long), SVD reduces dimensionalities optimally for linear projections. Among these techniques, APCA demonstrated its superiority comparing with the others, while SVD can provide better pruning efficiency than other techniques and scales up well to database sizes [12]. Equal length multi-attribute sequences are considered in [1]. A CS-Index structure is proposed for shift and scale transformations. The I/O overhead of CS-index increases with the number of attributes, making the index structure non-scalable to sequences of large attributes. Similarity search of multi-attribute sequences with different lengths cannot be solved by the distance definitions and the CSindex as proposed in [1]. In [14], multi-attribute sequences are partitioned into subsequences, each of which is contained in a Minimum Bounding Rectangle (MBR). Every MBR is indexed and stored into a database by using an R-tree or any of its variants. Partitioning of a sequence involves MBRs of other sequences in the database, causing the proposed method not scalable in large databases. If two sequences are of different lengths, 76

3 the shorter sequence is compared with the other by sliding from the beginning to the end of the longer sequence. This approach is not suitable for recognition of similar sequences of different lengths or similar sequences of different local speeds. Dynamic time warping (DTW) and longest common subsequence (LCS) are extended for similarity measures of multiattribute data in [19]. Before the exact LCS or DTW is performed, sequences are segmented into MBRs to be stored in an R-tree. Based on the MBR intersections, similarity estimates are computed to prune irrelevant sequences. Both DTW and LCS have a computational complexity of O(wd(m + n)), where w is a matching window size, d is the number of attributes, and m, n are the lengths of two data sequences. When w is a significant portion of m or n, the computation of DTW and LCS can be quadratic in the length of the sequences. This makes DTW and LCS non-scalable to large databases with long multi-attribute sequences. It has been shown in [19] that the index performance significantly degrades when the warping length increases. Even for a small number of 2 MBRs per long sequence, the index space requirements can be about a quarter of the dataset size. In [18], a weighted-sum SVD is defined as below for measuring the similarity of two multi-attribute motion sequences. Ψ(Q, P k) = min(ψ 1(M 1, M 2), Ψ 2(M 1, M 2)) where M 1 = Q T Q, M 2 = Pk T P k, Ψ 1(M 1, M 2) = ( n v i))/( i=1 σi ), Ψ2(M1, M2) = ( n i=1 λi(ui vi))/( n n i=1 σi(ui i=1 λi ), u i and v i are the respective i th right singular vectors of M 1 and M 2, and σ i and λ i are the respective i th largest singular values of M 1 and M 2. The weighted-sum SVD similarity measure computes the inner products of all the corresponding singular vectors of the corresponding matrices weighted by their singular values, and takes the minimum of the inner products sum as the similarity measure of two matrices. This definition includes some noise components since we observed that not all corresponding singular vectors of similar motions are close to each other as shown in Figure 4. Hence, not all the inner products of singular vectors need be considered in the definition of the distance measure. The inner products of singular vectors can be both positive and negative and the weights can be the singular values of either matrix. Hence, it is very likely that the weighted sum can drop or jump sharply even if a query approximately matches some pattern. Linear scanning is used in [18] to search for a similar sequence, which is computationally hard when the number of sequences in the database is very large. Attributes of the data indexed in the previous research are less than ten, whereas our proposed indexing structure can handle dozens or hundreds of data attributes. 3. BACKGROUND INFORMATION OF SVD In this section, we give the definition and geometric interpolation of SVD for its application to the indexing of multi-attribute motion data. As proved in [8], for any real m n matrix A, there exist orthogonal matrices U = [u 1, u 2,..., u m] R m m V = [v 1, v 2,..., v n] R n n and such that A = UΣV T where Σ = diag(σ 1, σ 2,..., σ min(m,n) ) R m n, σ 1 σ 2... σ min(m,n). The σ i is the i th singular value of A in non-increasing order and the unit column vectors u i and v i are the i th left singular vector and the i th right singular vector of A for i = min(m, n), respectively. For similar motions with different lengths, their left singular vectors are of different lengths, but their right singular vectors are of the equal length. The singular values of matrix A are unique, and the singular vectors corresponding to distinct singular values are uniquely determined up to the sign, or a singular vector can have opposite signs [17]. SVD exposes the geometric structure of a matrix A. If the multi-dimensional row vectors or points in A have different variations along different directions, the SVD of matrix A can find the direction with the largest variation. Along the direction of the first right singular vector, the row vectors in A have the largest variation, and along the second right singular vector direction, the point variation is the second largest, and so on. The singular values reflect the variations along the corresponding right singular vectors. Figure 2 illustrates the data in an 18 2 matrix with its first and second right singular vectors v 1 and v 2. The 18 points in the 18 2 matrix have different variations along different directions, hence data have the largest variation along v 1 as shown in Figure 2. v2 y Figure 2: Geometric Structure of Matrix Exposed by its SVD 4. INDEXING VECTOR GENERATION In this section, we transform multi-attribute motion data into representative vectors for indexing. 4.1 Representative Vector Generation for Indexing As with many similarity measure cases, motion matrices should have similar geometric structures if the corresponding motions are similar. Since the geometric similarity of matrix data can be captured by SVD, we propose to exploit SVD to generate representative vectors for motion matrices, and use these representative vectors as the bases for indexing the multi-attribute motion data. The motion patterns to be indexed in a database are assumed to be represented using different matrices P k, k = 1, 2,..., p, where p is total number of motion patterns in a v1 x 77

4 Inner Products of the First Singular Vectors Vector Component Values Similar Motions Dissimilar Motions Number of Inner Products -.8 u 2 v Vector Components Figure 3: Inner Products of the First Singular Vectors of Similar Motions and Those of Dissimilar Motions database. Let Q be the query motion. If the motions represented by Q and P k are similar, their corresponding first right singular vectors u 1 and v 1 should be mostly parallel to each other geometrically, so that u 1 v 1 = u. 1 v 1 cos(θ) = u 1 v 1 = 1, where θ is the angle between the two right singular vectors u 1 and v 1, and u 1 = v 1 = 1 by the definition of SVD. As Figure 3 shows, u 1 v 1 are very close to one when two motions are similar to each other, and are mostly less than.95 when two motions are not similar to each other. It indicates that the first right singular vector or its negation of one motion is close to the first right singular vector of a similar motion. They are most likely to be different from each other when two motions are not similar to each other. Other corresponding singular vectors may not be close to each other even if two motions are similar as shown in Figure 4. This immediately suggests that right singular vectors other than the first ones might not reflect the similarity of two similar motions and cannot be used to efficiently index multi-attribute motions, and first right singular vectors can be used to index multi-attribute motions. It is worth noting that for motions to be similar, other information should also be considered. For example, their corresponding singular values should also be proportional to each other as shown in [15]. Although being necessary conditions for similarity measure, close first right singular vectors are sufficient for indexing purpose given certain variation tolerance as will be shown in Section 6. Hence, given multi-attribute motion data, we propose to apply SVD to them to obtain their first right singular vectors, which will be used as the representative vectors for the indexing purpose. For convenience, we will refer to right singular vectors as singular vectors, and consider them as transposed row vectors hereafter. 4.2 Dimensionality Reduction of Representative Vectors The lengths or dimensions of the first singular vectors of multi-attribute motion data are usually greater than 15 (the length is 22 for our experimental data), and the performances of R-tree and its variants degrade significantly when the dimensionalities are larger than 15. In order to index the first singular vectors efficiently, dimensionality reduction needs to be performed on them first. Figure 4: The second singular vectors u 2 and v 2 for similar motions by two different subjects. u 2 v 2 =.57. The large variations in the second singular vectors show that the second singular vectors might not be close to each other due to the variations in the similar motions. Component Values of First Singular Vectors Motion Motion 2 Motion 3 Motion 4 Motion 5 Motion Components of First Singular Vectors Figure 5: Component Distributions of the First Singular Vectors Figure 5 illustrates the component distributions of the first singular vectors for six sets of motions. Motion 1 is similar to motion 2, and they are different from all the other four motions. Motion 3 is similar to motion 4, and motion 5 is similar to motion 6. Motions 3 and 4 are different from motions 5 and 6. Since the norm of a singular vector is 1 by the definition of SVD, the component values of first singular vectors are between -1 and 1. It can be seen from Figure 5 that although the first singular vectors are close to each other for similar motions, they can have some small variations at any components. Although the first singular vectors of dissimilar motions are mostly different from each other, some corresponding component values can be very close to each other. All these distribution properties of the first singular vectors make the R-tree or its variants not applicable to the first singular vectors. The zigzag changing of the first singular vectors as illustrated in Figure 5 indicates that the superior time series dimensionality reduction technique APCA [4] is not applicable to first singular vectors. Since they all have equal lengths or dimensions, the first singular vectors will form a singular vector matrix A if we put these row vectors together. The number of attributes needed to capture motions is usually 78

5 small, say dozens or at most several hundreds, making SVD the best dimensionality reduction technique taking into consideration that SVD is optimal for linear projections. Let A be the matrix composing the first right singular vectors of the motions to be indexed, and A = W ΣZ T (1) then AZ = W Σ gives the transformed first singular vectors of motion patterns in the coordinate system spanned by the column vectors of Z [13], and for a singular vector u 1 of a query motion, u 1Z gives a corresponding transformed singular vector of u 1 in the system spanned by the column vectors of Z. Due to singular value decomposition, the component variations of the transformed first singular vectors are the largest along direction z 1, and decreases along directions z 2,..., z n as shown in Figure 6. The transformed first singular vectors in Figure 6 are for the same 6 motions as shown in Figure 5. Unlike the first singular vectors as shown in Figure 5, the first several components of the transformed vectors have large variations, and the other components become very small and approach zero. The differences among the first singular vectors are optimally reflected in the first several dimensions of the transformed first singular vectors, hence we can index the transformed first singular vectors by keeping only the first several components. Differences among all the other components are small even if motions are different, so the other components can thus be ignored and the dimensionalities are reduced to the first several ones. The reason for this behavior is that Z gives the directions along which the variations of the singular vectors in A decrease from the first column vector to the last column vector of Z. AZ projects singular vectors in A along directions of column vectors of Z, and u 1Z projects query singular vector u 1 along the same directions. Hence, the transformed vectors in AZ have such properties that the variations among them decrease from the first component to the last component and the first r components optimally approximate the differences of the original singular vectors. Hence, only r components of the transformed vectors need to be kept, and all rest components can be truncated. We refer to the transformed first singular vectors with reduced dimensionalities as the reduced first singular vectors. 4.3 Negation of Reduced Singular Vectors Due to the sign non-uniqueness of singular vectors, the reduced singular vectors of similar motions can have opposite signs. If the first component of a reduced first singular vector is negative, we negate all components of the vector, then similar motions will have similar reduced vectors after negation. If the first component of the reduced singular vector is close to zero, small variation in the first component of a similar motion may hide the opposite sign of the reduced singular vector, and negation according to its first component might not solve the sign non-uniqueness issue. In this case, the reduced vector and its negation are both included in the database in order not to miss similar motions for a query with the first component of reduced singular vector close to zero. Since the first singular vectors are unit vectors which have norms of one, the norms of the reduced first singular vectors are less than one. Hence the first components of the reduced singular vectors range over (-1,1). Because the first components have the largest variance among all components as shown in Figure 6, the first components would not cluster around. Reduced singular vectors whose first components are close to are only a small fraction of all reduced singular vectors, thus including both those reduced vectors with first components close to and their negations in the index would only increase a small fraction of the actual number of vectors to be indexed. All reduced singular vectors are considered to have been negated if necessary from now on. Component Values of Transformed First Singular Vectors Motion 1 Motion 2 Motion Motion 4 Motion 5 Motion Components of Transformed First Singular Vectors Figure 6: Component Distributions of the Transformed First Singular Vectors 4.4 Effect of Large Component Variations We now make some important observations. As Figure 6 shows, if two motions are different, their corresponding first few transformed components would be very likely to be different, and can thus be separated very likely. If two motions are similar, the first components of their reduced first singular vectors are very close to each other, and the other components can have relatively larger differences. The variations in the corresponding components, especially the very first few components, should be within a small range for similar motions. If some corresponding components are quite different from each other, the two motions cannot be similar even though the Euclidean distance between the two reduced vectors is small. The Euclidean distance between the two reduced vectors of motion 1 and motion 2 is.51, and the Euclidean distance between the two reduced vectors of motion 1 and motion 3 is.48 as shown in Figure 7. Due to approximation caused by dimensionality reduction, retained vectors of more similar motions can have somewhat larger Euclidean distance than retained vectors of less similar motions. If we use the Minimum Bounding Rectangle (MBR) concept as in an R-tree, motion 1 and motion 3 are more likely to be grouped together than motion 1 and 2, although motion 1 and motion 2 are more similar to each other than motion 1 and motion 3. Hence R-tree cannot be used to efficiently index the reduced first singular vectors of multi-attribute motions. The main reason is that two motions cannot be similar when any very first corresponding components of their reduced first singular vectors are quite different from each other, no matter how close all other components are to each other. We refer to this as the pruning effect of large component variations. 79

6 Component Values of Transformed First Singular Vectors Motion 1 Motion 2 Motion Components of Transformed First Singular Vectors Figure 7: Pruning Effect of Large Component Variations 5. INDEX TREE CONSTRUCTION After generating the vectors for indexing, we construct an index tree structure as follows. Let r be the number of the retained dimensionalities of the reduced first singular vectors, r < n. We designate one level of the index tree to each of the r dimensions. Let level 1 be the root node, level i include nodes for dimension i, i = 1, 2,..., r, and level r + 1 contains the leaf nodes. Leaf nodes contain unique motion identifiers P k, and non-leaf nodes contain entries of the form (I i, cp) where I i is a closed interval [a, b] describing the component value ranges of the reduced first singular vectors at level i, 1 a < 1, 1 < b 1. Each entry has the address of one child node, and cp is the address of the child node in the tree. The index tree has the following properties: I i is not overlapped among sibling entries except the two overlapping ends. The interval width is the same for all entries of the same node except the first and last entries at one level, and can be different for nodes at different levels. The first entry has I i = [ ɛ, b 1] at level 1 and I i = [ 1, b] at all other non-leaf levels. The last entries have I i = [a, 1] at all non-leaf levels, where ɛ is some small positive value, < b 1 < 1, 1 < b <, and < a < 1. The interval width of I i is smaller at level 1 than those of any I i s of non-leaf nodes at all other levels. The interval widths at different levels are determined according to possible variations in component values of the reduced first singular vectors of similar motions. Since singular vectors are unique up to the sign, the reduced first singular vectors of two similar motions can have opposite signs. We negate all the vector components if the first component of the reduced vector is negative. If the first component of a reduced singular vector is close to zero, both the reduced vector and its negation are included in the index tree with the same identifier P k. 5.1 Searching For each level i of the index tree, we have a tolerance ε i which accounts for the variation in the i th component of the reduced first singular vectors v1 r for similar motions. Let the root node of the tree be T. The searching process is performed as follows. Query Processing: Given a query motion, we first compute its SVD and obtain its first singular vector u 1. Then compute the transformed vector u 1Z of u 1, where Z is the SVD component of matrix A composing of the first singular vectors of pattern motions as in (1). Truncate u 1Z so that its dimensionality is r, and let u r 1 be the reduced u 1Z, and c i be the i th component of u r 1. Subtree Searching: If T is a non-leaf node, find all entries whose I i s overlap with [c i ε i, c i + ε i]. For each overlapping entry, search the subtree whose root node T is pointed to by cp of the overlapping entry. Leaf Node Searching: If T is a leaf node, return all the motion pattern identifiers P k s contained in T. Let I i be the intermediate entry interval at level i, then 2ε i/i i entries per node traversed at level i would be covered in the searching range, or 2ε i/i i nodes would be traversed at level i + 1. Traversing a tree of r + 1 levels takes time O( 2ε i/i i ) r. We would construct the index tree such that 2ε i/i i 3, and r < 1. Since each level has -1 and 1 as its lower and upper value bounds, the number of entries in a node or the entry intervals I i are bounded and are independent of the number of pattern motions indexed by the tree. 2ε i accounts for the variation tolerance at level i, and is also independent of the number of pattern motions in the index. Hence the searching ranges or the number of searched entries for a query are be bounded for similar motions at each level, and are independent of the number of pattern motions. Because the level of the tree r+1 is bounded and is usually less than 1, the searching time should be independent of the number of patterns indexed by the tree, making the index structure very scalable to large number of multi-attribute motion datasets. As will be shown in Section 6, the CPU time for searching is quite small for index trees of level 7 in our implementation. Figure 8 illustrates the first three levels of an example index tree. Root node at level 1 has four entries, each of which has a child node at level 2. Each node at level 2 and level 3 has three entries, and each of which has a child node at one lower level. Given a query u r 1 = (.65,.15, -.1,...), and let ε 1 =.4, ε i =.8 for i 2. Entries at the root node are checked with [.65.4, ] = [.61,.69]. Only the third entry overlaps with it, so the query u r 1 is forwarded only to node n 3 of level 2. At level 2, the query searching range is [.15-.8,.15+.8] or [.7,.23]. The second and third entries of node n 3 overlap with the query searching range [.7,.23], hence query will be forwarded to node n 5 and to node n 6 at level 3. At level 3, the query searching range is [-.1-.8, ] or [-.18, -.2]. Only the second entries of nodes n 5 and n 6 overlap with this range, so the nodes pointed by the second entries of nodes n 5 and n 6 will be traversed. This searching goes on until the leaf nodes are traversed and all contained pattern identifiers P k are returned as the result of the query. 8

7 n1 Level 1 -.2,.3.3,.5.5,.7.7, 1... n2 n3 n4 Level , ,.2.2, n5 n6 Level 3-1, ,.2.2, 1-1, ,.2.2, 1 Figure 8: An Index Tree Example Showing Three Non-leaf Levels. Bold lines show how an example query traverses the nodes. 5.2 Similarity Computation After the index tree has been searched for a query, the majority of irrelevant motions should have been pruned, and similar motions and a small number of irrelevant motions are returned as the result of the query. To find out the motion most similar to the query, a similarity measure shown below as defined in [15] can be used to compute the similarity of the query and all returned motions, and the motion with the highest similarity is the one most similar to the query. Ψ(Q, P k) = u 1 v 1 ( σ λ η)/(1 η) (2) where u 1 and v 1 are the first singular vectors of Q and P k, respectively, σ = σ/ σ, λ = λ/ λ, and σ and λ are the vectors of the singular values of Q T Q and Pk T P k, respectively. Weight parameter η is to ensure that σ and λ have similar contributions as u 1 and v 1 have to the similarity measure and is determined by experiments. η can be set to.9 for the multi-attribute motion data. 6. PERFORMANCE EVALUATION In this section, we evaluate the performance of the indexing tree in pruning non-promising motions for a query. Let N pr be the number of irrelevant motions pruned for a query by the index tree, and N ir be the total number of irrelevant motions in the database. We define the pruning efficiency P as P = Npr N ir 1% 6.1 Motion Data Generation The motion data was generated by using CyberGlove, a fully instrumented glove with 22 sensors as shown in Figure 9 that provides 22 high-accuracy joint-angle measurements representing the angles of the physical hand joints at different parts of a hand. When signs (mostly American Sign Language signs) are performed each sensor emits the normalized joint angle values with a speed of about 12 times per second. The angle data from all sensors depict the shapes of the glove at each sampling time, and continuous shape changes show the motion of the glove, hence we consider the angle measurements from the sensors as (1D) Figure 9: 22 Sensors in a CyberGlove motion data. One hundred different motions have been carried out, each motion has a different duration, and all the resultant motion data matrices have different lengths ranging anywhere from about 2 to about 15 rows. Each different motion is repeated for three times, so that each motion has other two similar motions with different motion variations and different durations. The data matrix of one motion can have more than two times the length of the data matrix of a similar motion. Each motion data matrix is given a unique pattern identifier P k. 6.2 Index Structure Building The index structure is built based on the distribution properties of the reduced singular vectors of multi-attribute motion data. For testing the performance of the index tree, indexes with different tree configures have been built. I i, r, and ε i are determined as follows. The component value interval I i is around 2ε i so that high pruning efficiency can be obtained with quick query searching. The reason is two-fold. Level i of the index tree contains nodes for dimension i, 1 i r. If the width of I i is much smaller than 2ε i, the search would have to traverse multiple sibling nodes rather than a couple of nodes at level i + 1, and the subsequent traversing of the subtrees might cost searching time much more than necessary. On the other hand, if the width of I i is much larger than 2ε i, fewer irrelevant motions would be pruned. The reason is that although the searching time can be saved by searching only one node at level i + 1, more motion data will be included in the leaf nodes of the subtrees of the entry with large component value interval I i. We observed that 5 dimensions are sufficient to prune irrelevant motions and to give high pruning efficiency. The transformed first singular vectors are thus truncated to dimensionality r = 5, and the indexing tree can have a height of only 6 including the leaf node level. Figure 1 and Figure 11 show that for similar motions, the variations in the first components of the reduced first singular vectors are mostly within.4, and the other component 81

8 variations are largely within.1. Taking into consideration that variations of the reduced singular vectors of similar motions at dimension 1 are smaller than those at other dimensions, we build an index tree with the root node having more child nodes than any other non-leaf nodes have. Let L be the number of levels of the index tree, L = r + 1, and N i be the number of entries a node at level i has, i 1. In our implementation, we have N i = N 2 and ε i = ε 2 for i > 2. Pruning Efficiency (%) L = 6, N 1 = 12, N 2 = 7, ε 1 =.1 L = 6, N 1 = 1, N 2 = 6, ε 1 =.1 L = 7, N 1 = 8, N 2 = 5, ε 1 =.4 L = 6, N 1 = 8, N 2 = 5, ε 1 = First Component Variation for Similar Motions Number of Motions Figure 1: Variations in First Components of reduced Singular Vectors Component Variations for Similar Motions Second Component Third Component Number of Motions Fourth Component Fifth Component Sixth Component Figure 11: Variations in the Second to Sixth Components of reduced Singular Vectors 6.3 Pruning Efficiency We tested the pruning efficiencies of the index tree by using different tree configurations and different variation tolerances ε 1 and ε 2 for similar motions. We built trees with 8, 1, and 12 entries at level one, and 5, 6, and 7 entries per node at other non-leaf levels, respectively. ε 1 is set to.4.1, and ε 2 is set to.8.2. For each test case, we gave 3 queries and computed the average pruning efficiency as shown in Figure 12. When ε 2 increases while others remain the same, more entries are covered at the second and higher levels, and larger searching ranges or more nodes are to be traversed, resulting in decreasing pruning efficiency as shown in Figure 12. When ε 1 is set to.4, more than 98% queries return all three similar motions including one exact match. About 2% Searching Range ε 2 Figure 12: Pruning Efficiency under Different Tree Configurations and Searching Ranges queries return only the two most similar motions including one exact match, and the similar motions with the largest variations are pruned or falsely dismissed. In this case, more than 94% irrelevant motions are pruned when 6 dimensions are retained (L = 7), and about 91% irrelevant motions are pruned when 5 dimensions are retained (L = 6) as shown in Figure 12. When ε 1 is set to.1, all queries return all the three similar motions, and there is no false dismissal regardless the motion variations. In this case, the pruning efficiency can be as high as 93% when N 1 = 12 and N 2 = 7. If we decrease the number of entries per node so that N 1 = 1 and N 2 = 6, the pruning efficiency can be still as high at 91%. If ε 1 <.4, or ε 2 <.8, there would be false dismissals of similar motions with large variations. When ε 1 =.1, and ε 2 =.8, there would be no false dismissals and the pruning efficiency can be up to 91-93%. 6.4 Computational Efficiency We tested the average CPU time spent on the 3 queries by using the index tree to prune irrelevant motions and compared it with the CPU time for the same 3 queries by linear scan. All experiments are performed on a computer with Linux OS and one 1.4 GHz Intel Pentium 4 processor. On average, 1.25 ms are required for computing the SVD of a query matrix, and about 1 µs is required for computing the similarity of a query and one pattern after SVD of the query matrix has been computed. About 3 µs are required to find the most similar motion of a query among all the 3 patterns if linear scan is used. Dimensionality reduction of a query s first singular vector takes about 17 µs, and the CPU time of searching in the index tree are shown as Figure 13. The CPU time increases as expected when ε 2 increases. When only the most similar motions are expected for the queries, searching takes less than 1 µs when ε 1 =.4 and ε 2 =.8. The search time is less than 3 µs with ε 1 =.1 and all similar motions are returned for 3 queries. Since the searching time is independent of the number of pattern motions, we can see that dimensionality reduction of a query s first singular vector and the subsequent searching of the index tree for the query take less than 5 µs. As high as 93% irrelevant motions can be pruned. Hence, less than 8 µs are required for reducing the first singular vector of the query matrix, searching the 82

9 CPU Time (µsec) L = 6, N 1 = 12, N 2 = 7, ε 1 =.1 L = 6, N 1 = 1, N 2 = 6, ε 1 =.1 L = 7, N 1 = 8, N 2 = 5, ε 1 =.4 L = 6, N 1 = 8, N 2 = 5, ε 1 = Searching Range ε 2 Figure 13: CPU Time under Different Tree Configurations and Searching Ranges index tree and computing similarities of the query and the returned patterns. This is to be compared with about 3 µs required for linear scanning of the 3 patterns. It should be noted that the time taken by linear scan is proportional to the number of pattern motions and it would take linear scan several milliseconds to compute the similarities between a query and several thousand pattern motions, while searching the index tree takes less than 3 µs no matter how many motion patterns are indexed in the tree. 7. CONCLUSIONS AND DISCUSSIONS In this paper, we considered the problem of querying by example on a repository of multi-attribute motion data. We focused on developing an efficient indexing structure to address this problem. We discussed that the geometric structures as exposed by SVD of matrices of multi-attribute motion data can be exploited to develop an efficient index tree structure. Based on these discussions, we proposed an efficient interval-tree based index structure that uses the component values of the reduced first singular vectors. In the proposed structure, one level of the index tree is designated to each dimension of the reduced singular vectors. The searching time is independent of the number of motions indexed by the index tree, making the proposed index structure well scalable to large data repositories. After searching the index tree for a query Q, similarity of Q and returned patterns P k s can be computed by similarity measure (2) [15]. Experiments on real world data generated by using CyberGlove show that the proposed index tree can prune up to 91 93% irrelevant motions on average with no false dismissals, and a query search takes less than 3 µs. Hence, the proposed indexing technique can be used for real-time multi-attribute motion data comparison as well. We proposed an index structure for multi-attribute motion data based on the geometric structures as revealed by SVD of the motion data matrices. As long as the data sequences have certain geometric structures such that the variances along the directions of the first right singular vectors are much larger than those along other directions, the proposed index structure should be applicable to the sequences. Since typical multimedia applications do not generate random data, we believe the proposed index structure is applicable to the typical multimedia applications. 8. REFERENCES [1] R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms, pages 69 84, October [2] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R -tree: an efficient and robust access method for points and rectangles. In Proceedings of ACM SIGMOD Int l Conference on Management of Data, pages , May 199. [3] V. Castelli, A. Thomasian, and C.-S. Li. CSVD: Clustering and singular value decomposition for approximate similarity search in high-dimensional spaces. IEEE Transactions on knowledge and data engineering, 15(3): , 23. [4] K. Chakrabarti, E. J. Keogh, S. Mehrotra, and M. J. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems, 27(2): , 22. [5] K.-P. Chan and A. W.-C. Fu. Efficient time series matching by wavelets. In Proceedings of The 15th International Conference on Data Engineering, pages , March [6] K.-P. Chan, A. W.-C. Fu, and C. Yu. Haar wavelets for efficient similarity search of time-seiers: With and without time warping. IEEE Transactions on knowledge and data engineering, 15(3):686 75, 23. [7] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proceedings of ACM SIGMOD Int l Conference on Management of Data, pages , May [8] G. H. Golub and C. F. V. Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, Maryland, [9] A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proceedings of ACM SIGMOD Int l Conference on Management of Data, pages 47 57, June [1] T. Kahveci, A. Singh, and A. Gurel. Similarity searching for multi-attribute sequences. In Proceedings. of 14th Int l Conference on Scientific and Statistical Database Management, pages , July 22. [11] N. Katayama and S. Satoh. The SR-tree: a index structure for high-dimensional nearest neighbor queries. In Proceedings of ACM SIGMOD Int l Conference on Management of Data, pages , May [12] E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3): , 21. [13] F. Korn, H. V. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proceedings of ACM SIGMOD Int l Conference on Management of Data, pages 289 3, May

10 [14] S.-L. Lee, S.-J. Chun, D.-H. Kim, J.-H. Lee, and C.-W. Chung. Similarity search for multidimensional data sequences. In Proceedings. of 16th Int l Conference on Data Engineering, pages , Feb./Mar. 2. [15] C. Li, P. Zhai, S.-Q. Zheng, and B. Prabhakaran. Segmentation and recognition of multi-attribute motion sequences. To appear in Proceedings of The ACM Multimedia Conference 24, October 24. [16] I. Popivanov and R. J. Miller. Similarity search over time series data using wavelets. In Proceedings of The 18th International Conference on Data Engineering, pages , Feburary 22. [17] B. D. Schutter and B. D. Moor. The singular value decomposition in the extended max algebra. Linear Algebra and Its Applications, 25: , [18] C. Shahabi and D. Yan. Real-time pattern isolation and recognition over immersive sensor data streams. In Proceedings of The 9th Int l Conference on Multi-Media Modeling, pages , January 23. [19] M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh. Indexing multi-dimensional time-series with support for multiple distance measures. In Proceedings of ACM SIGMOD Int l Conference on Management of Data, pages , August

Integer Least Squares: Sphere Decoding and the LLL Algorithm

Integer Least Squares: Sphere Decoding and the LLL Algorithm Integer Least Squares: Sphere Decoding and the LLL Algorithm Sanzheng Qiao Department of Computing and Software McMaster University 28 Main St. West Hamilton Ontario L8S 4L7 Canada. ABSTRACT This paper

More information

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2

More information

Nearest Neighbor Search with Keywords in Spatial Databases

Nearest Neighbor Search with Keywords in Spatial Databases 776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,

More information

Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems

Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems LESLIE FOSTER and RAJESH KOMMU San Jose State University Existing routines, such as xgelsy or xgelsd in LAPACK, for

More information

Geometric interpretation of signals: background

Geometric interpretation of signals: background Geometric interpretation of signals: background David G. Messerschmitt Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-006-9 http://www.eecs.berkeley.edu/pubs/techrpts/006/eecs-006-9.html

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia

More information

SBASS: Segment based approach for subsequence searches in sequence databases

SBASS: Segment based approach for subsequence searches in sequence databases Comput Syst Sci & Eng (2007) 1-2: 37-46 2007 CRL Publishing Ltd International Journal of Computer Systems Science & Engineering SBASS: Segment based approach for subsequence searches in sequence databases

More information

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins 11 Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy OSBORN a, 1 and Saad ZAAMOUT a a Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge,

More information

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data 1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

More information

Statistics on Data Streams

Statistics on Data Streams Statistics on Data Streams T. H. Merrett McGill University March 6, 8 This is taken from [SZ4, Chap. 5]. It is the first of three applications they develop based on their review of timeseries techniques

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Similarity Search in Time Series Data Using Time Weighted Slopes

Similarity Search in Time Series Data Using Time Weighted Slopes Informatica 29 (2005) 79 88 79 imilarity earch in Time eries Data Using Time Weighted lopes Durga Toshniwal and R. C. Joshi Department of Electronics and Computer Engineering Indian Institute of Technology

More information

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE Yuan-chun Zhao a, b, Cheng-ming Li b a. Shandong University of Science and Technology, Qingdao 266510 b. Chinese Academy of

More information

SEARCHING THROUGH SPATIAL RELATIONSHIPS USING THE 2DR-TREE

SEARCHING THROUGH SPATIAL RELATIONSHIPS USING THE 2DR-TREE SEARCHING THROUGH SPATIAL RELATIONSHIPS USING THE DR-TREE Wendy Osborn Department of Mathematics and Computer Science University of Lethbridge 0 University Drive W Lethbridge, Alberta, Canada email: wendy.osborn@uleth.ca

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia

More information

Information Storage Capacity of Crossbar Switching Networks

Information Storage Capacity of Crossbar Switching Networks Information Storage Capacity of Crossbar Switching etworks ABSTRACT In this work we ask the fundamental uestion: How many bits of information can be stored in a crossbar switching network? The answer is

More information

Matrices, Vector Spaces, and Information Retrieval

Matrices, Vector Spaces, and Information Retrieval Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes

More information

Efficient Spatial Data Structure for Multiversion Management of Engineering Drawings

Efficient Spatial Data Structure for Multiversion Management of Engineering Drawings Efficient Structure for Multiversion Management of Engineering Drawings Yasuaki Nakamura Department of Computer and Media Technologies, Hiroshima City University Hiroshima, 731-3194, Japan and Hiroyuki

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

LAYERED R-TREE: AN INDEX STRUCTURE FOR THREE DIMENSIONAL POINTS AND LINES

LAYERED R-TREE: AN INDEX STRUCTURE FOR THREE DIMENSIONAL POINTS AND LINES LAYERED R-TREE: AN INDEX STRUCTURE FOR THREE DIMENSIONAL POINTS AND LINES Yong Zhang *, Lizhu Zhou *, Jun Chen ** * Department of Computer Science and Technology, Tsinghua University Beijing, China, 100084

More information

IBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces

IBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces RC24859 (W0909-03) September 0, 2009 Computer Science IBM Research Report A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces George Saon, Peder Olsen

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Scalable Algorithms for Distribution Search

Scalable Algorithms for Distribution Search Scalable Algorithms for Distribution Search Yasuko Matsubara (Kyoto University) Yasushi Sakurai (NTT Communication Science Labs) Masatoshi Yoshikawa (Kyoto University) 1 Introduction Main intuition and

More information

Execution time analysis of a top-down R-tree construction algorithm

Execution time analysis of a top-down R-tree construction algorithm Information Processing Letters 101 (2007) 6 12 www.elsevier.com/locate/ipl Execution time analysis of a top-down R-tree construction algorithm Houman Alborzi, Hanan Samet Department of Computer Science

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Fractal Dimension and Vector Quantization

Fractal Dimension and Vector Quantization Fractal Dimension and Vector Quantization [Extended Abstract] Krishna Kumaraswamy Center for Automated Learning and Discovery, Carnegie Mellon University skkumar@cs.cmu.edu Vasileios Megalooikonomou Department

More information

2005 Journal of Software (, ) A Spatial Index to Improve the Response Speed of WebGIS Servers

2005 Journal of Software (, ) A Spatial Index to Improve the Response Speed of WebGIS Servers 1000-9825/2005/16(05)0819 2005 Journal of Software Vol.16, No.5 WebGIS +,, (, 410073) A Spatial Index to Improve the Response Speed of WebGIS Servers YE Chang-Chun +, LUO Jin-Ping, ZHOU Xing-Ming (Institute

More information

On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries

On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries Yossi Matias School of Computer Science Tel Aviv University Tel Aviv 69978, Israel matias@tau.ac.il Daniel Urieli School

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

N-mode Analysis (Tensor Framework) Behrouz Saghafi

N-mode Analysis (Tensor Framework) Behrouz Saghafi N-mode Analysis (Tensor Framework) Behrouz Saghafi N-mode Analysis (Tensor Framework) Drawback of 1-mode analysis (e.g. PCA): Captures the variance among just a single factor Our training set contains

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters

Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters 38.3 Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters Joan Carletta Akron, OH 4435-3904 + 330 97-5993 Robert Veillette Akron, OH 4435-3904 + 330 97-5403 Frederick Krach Akron,

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information

Maximum Achievable Gain of a Two Port Network

Maximum Achievable Gain of a Two Port Network Maximum Achievable Gain of a Two Port Network Ali Niknejad Siva Thyagarajan Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-15 http://www.eecs.berkeley.edu/pubs/techrpts/2016/eecs-2016-15.html

More information

Wavelets and Multiresolution Processing

Wavelets and Multiresolution Processing Wavelets and Multiresolution Processing Wavelets Fourier transform has it basis functions in sinusoids Wavelets based on small waves of varying frequency and limited duration In addition to frequency,

More information

Module 7:Data Representation Lecture 35: Wavelets. The Lecture Contains: Wavelets. Discrete Wavelet Transform (DWT) Haar wavelets: Example

Module 7:Data Representation Lecture 35: Wavelets. The Lecture Contains: Wavelets. Discrete Wavelet Transform (DWT) Haar wavelets: Example The Lecture Contains: Wavelets Discrete Wavelet Transform (DWT) Haar wavelets: Example Haar wavelets: Theory Matrix form Haar wavelet matrices Dimensionality reduction using Haar wavelets file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture35/35_1.htm[6/14/2012

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Describing and Forecasting Video Access Patterns

Describing and Forecasting Video Access Patterns Boston University OpenBU Computer Science http://open.bu.edu CAS: Computer Science: Technical Reports -- Describing and Forecasting Video Access Patterns Gursun, Gonca CS Department, Boston University

More information

Nearest Neighbor Search for Relevance Feedback

Nearest Neighbor Search for Relevance Feedback Nearest Neighbor earch for Relevance Feedbac Jelena Tešić and B.. Manjunath Electrical and Computer Engineering Department University of California, anta Barbara, CA 93-9 {jelena, manj}@ece.ucsb.edu Abstract

More information

Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising

Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Curt Da Silva and Felix J. Herrmann 2 Dept. of Mathematics 2 Dept. of Earth and Ocean Sciences, University

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example

More information

Image Data Compression

Image Data Compression Image Data Compression Image data compression is important for - image archiving e.g. satellite data - image transmission e.g. web data - multimedia applications e.g. desk-top editing Image data compression

More information

Representative Path Selection for Post-Silicon Timing Prediction Under Variability

Representative Path Selection for Post-Silicon Timing Prediction Under Variability Representative Path Selection for Post-Silicon Timing Prediction Under Variability Lin Xie and Azadeh Davoodi Department of Electrical & Computer Engineering University of Wisconsin - Madison Email: {lxie2,

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Adaptive Burst Detection in a Stream Engine

Adaptive Burst Detection in a Stream Engine Adaptive Burst Detection in a Stream Engine Daniel Klan, Marcel Karnstedt, Christian Pölitz, Kai-Uwe Sattler Department of Computer Science & Automation Ilmenau University of Technology, Germany {first.last}@tu-ilmenau.de

More information

5 Spatial Access Methods

5 Spatial Access Methods 5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and

More information

Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce

Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce Yanhui Liang, Stony Brook University Hoang Vo, Stony Brook University Ablimit Aji, Hewlett Packard Labs Jun Kong, Emory University

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Multiresolution Analysis

Multiresolution Analysis Multiresolution Analysis DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Frames Short-time Fourier transform

More information

Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases

Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases Xiang Lian and Lei Chen Department of Computer Science and Engineering Hong Kong University of Science and Technology Clear

More information

5 Spatial Access Methods

5 Spatial Access Methods 5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and

More information

Identifying Patterns and Anomalies in Delayed Neutron Monitor Data of Nuclear Power Plant

Identifying Patterns and Anomalies in Delayed Neutron Monitor Data of Nuclear Power Plant Identifying Patterns and Anomalies in Delayed Neutron Monitor Data of Nuclear Power Plant Durga Toshniwal, Aditya Gupta Department of Computer Science & Engineering Indian Institute of Technology Roorkee

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Learning Dissimilarities on Time Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan Time series structure and

More information

Test Generation for Designs with Multiple Clocks

Test Generation for Designs with Multiple Clocks 39.1 Test Generation for Designs with Multiple Clocks Xijiang Lin and Rob Thompson Mentor Graphics Corp. 8005 SW Boeckman Rd. Wilsonville, OR 97070 Abstract To improve the system performance, designs with

More information

9 Searching the Internet with the SVD

9 Searching the Internet with the SVD 9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

Analysis of the Current Distribution in the ITER CS-Insert Model Coil Conductor by Self Field Measurements

Analysis of the Current Distribution in the ITER CS-Insert Model Coil Conductor by Self Field Measurements IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, VOL. 12, NO. 1, MARCH 2002 1675 Analysis of the Current Distribution in the ITER CS-Insert Model Coil Conductor by Self Field Measurements A. Nijhuis, Yu.

More information

Dimension reduction based on centroids and least squares for efficient processing of text data

Dimension reduction based on centroids and least squares for efficient processing of text data Dimension reduction based on centroids and least squares for efficient processing of text data M. Jeon,H.Park, and J.B. Rosen Abstract Dimension reduction in today s vector space based information retrieval

More information

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Tailored Bregman Ball Trees for Effective Nearest Neighbors Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia

More information

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries

More information

Study of Wavelet Functions of Discrete Wavelet Transformation in Image Watermarking

Study of Wavelet Functions of Discrete Wavelet Transformation in Image Watermarking Study of Wavelet Functions of Discrete Wavelet Transformation in Image Watermarking Navdeep Goel 1,a, Gurwinder Singh 2,b 1ECE Section, Yadavindra College of Engineering, Talwandi Sabo 2Research Scholar,

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

Linear Least-Squares Data Fitting

Linear Least-Squares Data Fitting CHAPTER 6 Linear Least-Squares Data Fitting 61 Introduction Recall that in chapter 3 we were discussing linear systems of equations, written in shorthand in the form Ax = b In chapter 3, we just considered

More information

Nearest Neighbor Search with Keywords

Nearest Neighbor Search with Keywords Nearest Neighbor Search with Keywords Yufei Tao KAIST June 3, 2013 In recent years, many search engines have started to support queries that combine keyword search with geography-related predicates (e.g.,

More information

Singular Value Decomposition and Digital Image Compression

Singular Value Decomposition and Digital Image Compression Singular Value Decomposition and Digital Image Compression Chris Bingham December 1, 016 Page 1 of Abstract The purpose of this document is to be a very basic introduction to the singular value decomposition

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Activity Mining in Sensor Networks

Activity Mining in Sensor Networks MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Activity Mining in Sensor Networks Christopher R. Wren, David C. Minnen TR2004-135 December 2004 Abstract We present results from the exploration

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

arxiv: v2 [cs.lg] 13 Mar 2018

arxiv: v2 [cs.lg] 13 Mar 2018 arxiv:1802.03628v2 [cs.lg] 13 Mar 2018 ABSTRACT Han Qiu Massachusetts Institute of Technology Cambridge, MA, USA hanqiu@mit.edu Francesco Fusco IBM Research Dublin, Ireland francfus@ie.ibm.com We propose

More information

Covers new Math TEKS!

Covers new Math TEKS! Covers new Math TEKS! 4 UPDATED FOR GREEN APPLE GRADE 4 MATH GREEN APPLE EDUCATIONAL PRODUCTS Copyright infringement is a violation of Federal Law. by Green Apple Educational Products, Inc., La Vernia,

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Indexing Objects Moving on Fixed Networks

Indexing Objects Moving on Fixed Networks Indexing Objects Moving on Fixed Networks Elias Frentzos Department of Rural and Surveying Engineering, National Technical University of Athens, Zographou, GR-15773, Athens, Hellas efrentzo@central.ntua.gr

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

CS 436 HCI Technology Basic Electricity/Electronics Review

CS 436 HCI Technology Basic Electricity/Electronics Review CS 436 HCI Technology Basic Electricity/Electronics Review *Copyright 1997-2008, Perry R. Cook, Princeton University August 27, 2008 1 Basic Quantities and Units 1.1 Charge Number of electrons or units

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Time Series (1) Information Systems M

Time Series (1) Information Systems M Time Series () Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/si-m/ Time series are everywhere Time series, that is, sequences of observations (samples) made through time,

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Wavelets for Efficient Querying of Large Multidimensional Datasets

Wavelets for Efficient Querying of Large Multidimensional Datasets Wavelets for Efficient Querying of Large Multidimensional Datasets Cyrus Shahabi University of Southern California Integrated Media Systems Center (IMSC) and Dept. of Computer Science Los Angeles, CA 90089-0781

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Erik G. Hoel. Hanan Samet. Center for Automation Research. University of Maryland. November 7, Abstract

Erik G. Hoel. Hanan Samet. Center for Automation Research. University of Maryland. November 7, Abstract Proc. of the 21st Intl. Conf. onvery Large Data Bases, Zurich, Sept. 1995, pp. 606{618 Benchmarking Spatial Join Operations with Spatial Output Erik G. Hoel Hanan Samet Computer Science Department Center

More information

A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

More information

Algorithm 805: Computation and Uses of the Semidiscrete Matrix Decomposition

Algorithm 805: Computation and Uses of the Semidiscrete Matrix Decomposition Algorithm 85: Computation and Uses of the Semidiscrete Matrix Decomposition TAMARA G. KOLDA Sandia National Laboratories and DIANNE P. O LEARY University of Maryland We present algorithms for computing

More information

WAITING-TIME DISTRIBUTION FOR THE r th OCCURRENCE OF A COMPOUND PATTERN IN HIGHER-ORDER MARKOVIAN SEQUENCES

WAITING-TIME DISTRIBUTION FOR THE r th OCCURRENCE OF A COMPOUND PATTERN IN HIGHER-ORDER MARKOVIAN SEQUENCES WAITING-TIME DISTRIBUTION FOR THE r th OCCURRENCE OF A COMPOUND PATTERN IN HIGHER-ORDER MARKOVIAN SEQUENCES Donald E. K. Martin 1 and John A. D. Aston 2 1 Mathematics Department, Howard University, Washington,

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information Introduction Consider a linear system y = Φx where Φ can be taken as an m n matrix acting on Euclidean space or more generally, a linear operator on a Hilbert space. We call the vector x a signal or input,

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information

1. Ignoring case, extract all unique words from the entire set of documents.

1. Ignoring case, extract all unique words from the entire set of documents. CS 378 Introduction to Data Mining Spring 29 Lecture 2 Lecturer: Inderjit Dhillon Date: Jan. 27th, 29 Keywords: Vector space model, Latent Semantic Indexing(LSI), SVD 1 Vector Space Model The basic idea

More information

A Computationally Efficient Block Transmission Scheme Based on Approximated Cholesky Factors

A Computationally Efficient Block Transmission Scheme Based on Approximated Cholesky Factors A Computationally Efficient Block Transmission Scheme Based on Approximated Cholesky Factors C. Vincent Sinn Telecommunications Laboratory University of Sydney, Australia cvsinn@ee.usyd.edu.au Daniel Bielefeld

More information