Two Birds With One Stone: An Efficient Hierarchical Framework for Top-k and Threshold-based String Similarity Search

Size: px
Start display at page:

Download "Two Birds With One Stone: An Efficient Hierarchical Framework for Top-k and Threshold-based String Similarity Search"

Transcription

1 Two Birds With One Stone: An Efficient Hierarchica Framework for Top-k and Threshod-based String Simiarity Search Jin Wang Guoiang Li Dong Deng Yong Zhang Jianhua Feng Department of Computer Science and Technoogy, TNList, Tsinghua University, Beijing 84, China. Abstract String simiarity search is a fundamenta operation in data ceaning and integration. It has two variants, threshodbased string simiarity search and top-k string simiarity search. Existing agorithms are efficient either for the former or the atter; most of them can t support both two variants. To address this imitation, we propose a unified framework. We first recursivey partition strings into disjoint segments and buid a hierarchica segment tree index (HS-Tree) on top of the segments. Then we utiize the HS-Tree to support simiarity search. For threshod-based search, we identify appropriate tree nodes based on the threshod to answer the query and devise an efficient agorithm (HS-Search). For top-k search, we identify promising strings with arge possibiity to be simiar to the query, utiize these strings to estimate an upper bound which is used to prune dissimiar strings, and propose an agorithm (HS-Topk). We aso deveop effective pruning techniques to further improve the performance. Experimenta resuts on rea-word datasets show our method achieves high performance on the two probems and significanty outperforms state-of-the-art agorithms. I. INTRODUCTION As an important operation in data ceaning and integration, string simiarity search has attracted significant attention from the database community. It has a widespread rea appications such as web search, spe checking, and DNA sequence discovery in bio-informatics [], []. Given a set of strings and a query, string simiarity search aims to find a the strings from the string set that are simiar to the query. There are many metrics to quantify the simiarity between strings, such as Jaccard, Cosine and edit distance. Among them edit distance is one of the most widey-used metrics to toerate typographica errors [5], [6], [], [4]. In this paper we focus on using edit distance to quantify string simiarity. There are two variants of the string simiarity search probem. The first is threshod-based string simiarity search, which requires users to input a threshod and identifies a the strings from the string set whose edit distances to the query is within the given threshod. The second is top-k string simiarity search, which finds k strings from the string set that have the smaest edit distances to the query. Existing studies on the threshod-based simiarity search probem [2], [], [2], [6] empoy a fiter-verification framework. In the fiter step, they utiize the threshod to devise effective fiters in order to prune arge numbers of dissimiar strings and generate a set of candidates. In the verification step, they verify the candidates by computing their rea edit distances to the query. These threshod-based agorithms are rather expensive for top-k search, because to support top-k search, they have to enumerate the threshod incrementay, execute the search operation for each threshod, and thus invove penty of unnecessary computations. On the other hand, existing studies on top-k simiarity search [26], [27], [5] are inefficient for threshod-based search because they cannot make fu use of the given threshod to do pruning. To summarize, existing methods are efficient either for threshodbased search or for top-k search and most of them can t efficienty support both of the two probems. Thus it cas for a unified framework to efficienty support the two variants of string simiarity search. To address this probem, we propose a unified framework which can efficienty support these two variants. We first recursivey partition data strings in the string set into disjoint segments and buid a hierarchica segment tree index (HS-Tree) on top of the segments. Then we utiize the HS-Tree to answer a threshod-based query or top-k query. For threshod-based simiarity search, based on the pigeonhoe principe, if a data string is simiar to the query, the data string must have enough segments matching some substrings of the query. We can utiize the HS-Tree to identify such strings and devise an efficient agorithm for threshod-based simiarity search. For top-k simiarity search, we first access the promising data strings that have arge possibiity to be simiar to the query (e.g., the strings sharing argest number of common segments with the query). Based on the promising strings, we can accuratey estimate an upper bound of the edit distances of top-k answers to the query and utiize the bound to prune dissimiar strings. We utiize the HS-Tree to identify the promising data strings and devise an efficient agorithm for top-k simiarity search. Experimenta resuts on rea datasets show our method achieves high performance on both of the two probems and significanty outperforms stateof-the-art agorithms. In this paper, we make the foowing contributions. We propose a hierarchica segment index HS-Tree which can be utiized to support both threshod-based simiarity search and top-k simiarity search. We devise the HS-Search agorithm based on the HS-Tree index to faciitate threshod-based simiarity search. We propose the HS-Topk agorithm based on HS-Tree index to support top-k simiarity search. We deveop batch-based and greedy-match-based pruning strategies to prune dissimiar strings. We have conducted an extensive set of experiments. Experimenta resuts show our method achieves high performance on both threshod-based simiarity search and top-k simiarity search and significanty outperforms state-of-the-art agorithms. The rest of the paper is organized as foows. We formuate our probem and review reated works in Section II. We intro /5/$3. 25 IEEE 59 ICDE Conference 25

2 duce our hierarchica index in Section III. The HS-Search agorithm is proposed to support threshod-based simiarity search in Section IV and the HS-Topk agorithm is presented to support top-k simiarity search in Section V.Experimenta resuts are reported in Section VI. We concude in Section VII. II. PRELIMINARIES In this section, we first formuate the probem in Section II-A and then review reated works in Section II-B. A. Probem Definition Given two strings r and s, the edit distance between r and s, denoted as ED(r, s), is the minimum number of edit operations (incuding substitution, insertion, and deetion) needed to transform r to s. There are two variants in string simiarity search. The first identifies the strings from a string set whose edit distances to the query are not arger than a given threshod. The second finds top-k strings with the smaest edit distances to the query. Next we formuate the two probems. Definition (Threshod-based Simiarity Search): Given a string set S, a query q, and a threshod τ, threshod-based simiarity search finds a strings s S such that ED(s, q) τ. Definition 2 (Top-k Simiarity Search): Given a string set S, a query q, and an integer k, top-k simiarity search finds a subset R S, such that R = k and for any r R and s S R, ED(r, q) ED(s, q). Exampe : Consider the string set in Tabe I. Suppose the query q= brothor, τ = and k = 2. The threshodbased simiarity search returns { brother } since the edit distance between brother and q= brothor is and the edit distances between other strings and q are arger than. The top-2 simiarity search returns R={ brother, brothe }, because the edit distances between the two strings and q are respectivey and 2, and the edit distances for other strings to the query are not smaer than 2. TABLE I. A STRING COLLECTION ID String Length s brother 7 s 2 brothe 7 s 3 broathe 7 s 4 breathes 8 s 5 swingabe 9 s 6 deduction 9 s 7 abna evina s 8 christopher swenson 9 B. Reated Works ) Threshod-based Simiarity Search: There are many simiarity search agorithms [2], [2], [6], [27], [4]. Most of existing studies empoy a fiter-verification framework to address the string simiarity search probem, and many effective fiters have been devised to prune dissimiar strings. Sarawagi et a. [7] proposed count fiter based on n-grams. Li et a. [] extended the count fiter and deveoped severa ist-merge agorithms to improve the performance. Wang et a. [2] proposed an adaptive prefix fitering framework to improve the performance. Zhang et a. [27] proposed B ed - tree which utiized B + -tree structure to index strings. Li et a. [2] used variabe ength grams as signatures to support string simiarity search. Qin et a. [6] devised an asymmetry signature. Hadjieeftheriou et a. [7] proposed a hash-based method to estimate the number of resuts. These methods have two imitations. First, the n-grambased signature has ower pruning than our segment-based signature, because to avoid missing resuts, n cannot be arge and a sma n has imited pruning power. Second, athough we can extend such methods to support top-k simiarity search by enumerating different threshods, they are rather expensive as they require to perform the search agorithms many times. 2) Top-k Simiarity Search: There have aready been some studies on top-k simiarity search. Yang et a. [26] proposed an n-gram based method to support top-k simiarity search. It dynamicay tuned the ength of grams according to different threshods. However, it needed to buid dupicate indexes for each n and ed to ow efficiency. Deng et a. [5] proposed a range-based agorithm by grouping specific entries to avoid dupicated computations in the dynamic-programming matrix when the edit distance is computed. This agorithm used a trie index to share the common prefixes of strings. However for ong strings, there are fewer common prefix and the pruning power wi be imited. B ed -tree [27] can aso be used to support top-k simiarity search. However this method utiized n-grams to group simiar strings together, which needed to enumerate many different threshods and thus ed to ow efficiency. Wang et a. [22] designed a nove fiter-and-refine pipeine approach that utiized approximate n-gram matchings to compute top-k resuts. Compared with these agorithms, our method has two advantages. First, we can utiize the HS-Tree to identify promising strings so as to estimate a tighter upper bound and use the bound to eiminate dissimiar strings. Thus our method achieves higher performance on top-k simiarity search. Second, our method can aso efficienty support the threshodbased search, because our method can seect appropriate nodes in the HS-Tree based on the given threshod and utiize these nodes to efficienty identify the answer. 3) Simiarity Join: There are many studies on string simiarity join. Given two string sets, simiarity join finds a simiar string pairs [], [4], [9], [24], [25], [2], [6]. An experimenta survey is made in [9]. Bayardo et a. [] proposed the prefix fiter based method for simiarity join. Xiao et a. proposed the position fiter [25] and mismatch fiter [24] to improve the prefix fiter. Wang et a. [9] proposed a trie-based method to support simiarity join. Li et a. [4] proposed the segment fiter with efficient substring seection and verification methods to perform simiarity join. Athough Adapt [2] and QChunk [6] extended join techniques to support search, they perform much worse than our method (see Section VI). In this paper, we extend the segment-based fiter in [4] to support simiarity search. Different from PassJoin which requires to first specify a threshod and then buid the index based on the threshod, we can buid an HS-Tree index in an offine step and utiize the index to answer simiarity search queries with arbitrary threshods. For top-k simiarity search, since it is rather hard to predefine an appropriate threshod, PassJoin has to enumerate the threshod and thus achieves ow performance. Our HS-Tree addresses the imitations of PassJoin and can efficienty support top-k and threshodbased simiarity search. 4) Other Reated Works: Some previous studies focus on approximate entity extraction, which gives a dictionary of entities and a document, finds a substrings in the document that are simiar to some entities. Wang et a. [2] proposed 52

3 S min i=2,j= S i=2,j= L S i=,j= S -τ i=,j= L Group By Length i=2,j=2 L S S i=2,j=2 S i=2,j=3 i=2,j=3 L i=,j=2 L S +τ i=,j=2 S i=2,j=4 L S S max i=2,j=4 Agorithm : HS-Tree Construction (S) Input: S: The string set Output: The HS-Tree index begin 2 Group strings in S by ength; 3 for = min to max do 4 Cacuate the maximum eve L = og 2 ; 5 Generate sets S,, S,2, nodes n,, n,2, indexes L,, L,2 ; 6 for i = 2 to n do 7 8 for j = to 2 i do Generate sets S i,2j, n i,2j n i,2j, S i,2j, nodes, L i,2j, indexes L i,2j ; i=n,j= L i=n,j=2 n L 9 end Fig.. The HS-Tree Index. b,2,3 bro ther the athe,2,3 2 3 ro,2,3 Fig. 2. An HS-Tree exampe for S 7. th,2 at 3 er e he 2 3 a neighborhood deetion based method to sove this probem. Li et a. [3] proposed a unified framework to support approximate entity extraction under various simiarity functions. Recenty Kim et a. [] proposed efficient agorithms to sove the probem of substring top-k simiarity search, which finds the top-k approximate substrings matching with a given query, which is different from our top-k simiarity search probem. Another reated topic is query autocompetion. Ji et a. [8] proposed a trie-based structure to support fuzzy prefix search and Li et a. [5] improved it by removing unnecessary nodes. Chaudhuri et a. [3] proposed a simiar soution. Xiao et a. [23] extended the neighborhood deetion method to improve the performance of query autocompetion. III. THE HIERARCHICAL SEGMENT INDEX In this section, we introduce a hierarchica segment tree (HS-Tree) to index the data strings. We first group the strings by ength and et S denote the group of strings with ength. For each group S, we buid a compete binary tree, where the root is a dummy node. We partition each string s S into two disjoint segments where the first segment is the prefix of s with ength s 2 and the second segment is the suffix of s with ength s 2, where s is the ength of s. Let S, and S,2 respectivey denote the set of the first segments and that of the second segments for the strings in S. Based on these two sets, we generate two chidren of the root, i.e., two nodes in the first eve, n i=,j= and n i=,j=2, where i denotes the eve and j denotes the sibing. (The eve of the root is.) For each tree node n i,j, we buid an inverted index, where entries are segments in S i,j and each segment with segment id j is associated with an inverted ist L i,j which is a ist of strings that contain the segment. Next we recursivey construct the tree. For each node n i,j, we partition each segment in S i,j into two segments (using the same partition method above) and et S i+,2j and S i+,2j respectivey denote the set of the first segments and that of the second segments. Then node n i,j has two chidren n i+,2j and n i+,2j with respect to S i+,2j and S i+,2j. We aso buid the inverted indexes L i+,2j and L i+,2j. The procedure is terminated if there exists a segment in the eve with ength of. In other words, the maximum eve is og 2. Figure iustrates the index structure. Agorithm shows the agorithm to buid the HS-Tree index. It first groups strings by ength (ine 2). For each group S, it buids a hierarchica index with L eves, where L = og 2 (ine 4). In eve i, it iterativey partitions the segments in eve i into 2 segments and constructs invert indexes L i,j for the j th segment in eve i (ines 6-8). Exampe 2: Consider the string set in Tabe I. We first group them by ength and then iterativey buid HS-Tree for each ength. Take group S 7 in Figure 2 as an exampe. The group ength is 7. The maxima eve is L = og 2 7 = 2. Consider string s = brother. In eve, s is partitioned into two segments { bro, ther }. Then in eve 2, these two segments are iterativey partitioned into 2 segments { b, ro }, and { th, er }. Simiary, we can iterativey partition strings s 2 and s 3 to buid the index. Space Compexity: We anayze the space compexity of the HS-Tree. Suppose min is the minimum string ength and max is the maximum string ength. For each group S, the HS-Tree index incudes segments and the inverted index. S contains og eves. In the i th eve, there are 2 i segments. Thus each string is partitioned into at most og j= 2j = O() segments. Obviousy each string is contained in at most O() inverted ists, and thus the space compexity of the HS-Tree is O( = max = min S ), which means the tota number of characters in a input strings. Discussion: For ease of presentation, in the HS-Tree, we assume each node has two chidren and our method can be easiy extended to support the case that each node has more than two chidren, ike B-tree. In addition, we focus on inmemory setting and our method can be extended to support disk-based setting (ike B-tree). Due to space constraints, we eave it as a future work. 52

4 IV. THRESHOLD-BASED SIMILARITY SEARCH In this section, we devise an efficient agorithm HS- Search to efficienty answer threshod-based simiarity search queries using the HS-Tree index (see Section IV-A). We first introduce a fiter-verification framework and then deveop nove techniques to improve both the fiter step (see Section IV-B) and the verification step (see Section IV-C). A. The HS-Search Agorithm Consider a query q with a threshod τ. Based on the ength fiter: two strings cannot be simiar if their ength difference is arger than τ, we ony need to access the HS-Tree with engths between q τ and q + τ. Consider the HS-Tree with ength [ q τ, q + τ]. In the i th eve, the strings are partitioned into 2 i segments. For 2 i τ +, any string s in S cannot be simiar to q if q has no substring matching a segment of s based on the pigeonhoe principe. Moreover, if s is simiar to q, q must contain at east 2 i τ segments of s, as formaized in Lemma. On the contrary, if 2 i < τ +, any string in S may be simiar to q even if q has no substring matching a segment of the string. Lemma : Given a string s with 2 i segments, a query q and a threshod τ, if s is simiar to q, q must contain at east 2 i τ segments of s. Exampe 3: Consider a query q={ swaingbe }, the data string s 5 = {swingabe}, s 6 = {deduction}, and τ = 2. In eve 2, the 4 segments of s 5 are { sw, in, ga, be }, and q has = 2 common substrings sw and in with s 5, thus s 5 is a candidate for query q and τ = 2. On the contrary, the 4 segments of s 6 = {deduction} are { de, du, ct, ion }. And s 6 has no matched segments with q. So we can safey prune s 6. Generay, consider the eve i og 2 (τ +). If a string has ess than 2 i τ segments that match substrings of the query, we can prune it. In other words, we ony need to check the candidate strings which share at east 2 i τ common segments with q. To faciitate the checking, we can utiize the inverted index on each node to identify the candidates, which wi be discussed in detai ater. As there are many eve i such that i og 2 (τ + ), we can use any of such eves to identify candidates. Obviousy the deeper the eve is, the shorter the segment is, and thus the ower pruning power is. Thus we use the minima eve og 2 (τ + ) with ongest segments. Formay, consider a query q and group S. We first ocate the i = og 2 (τ + ) th eve. For each node n i,j ( j 2 i ), we compute the ength of segments in this node Len i,j. (It is worth noting that the segments in each node have the same ength.) To check whether q has a substring matching a segment in node n i,j, we ony need to enumerate the set of substrings of q with ength Len i,j, denote as W(q, L i,j ). We wi discuss how to reduce the size of W(q, L i,j ) in Section IV-B. Next we find the strings which have at east 2 i τ segments matching the query. To this end, for each substring in W(q, L i,j ), we identify the substring from the inverted index of the node and retrieve the inverted ist of the substring. Next we compute the strings that appear more than 2 i τ times on the invited ists. Such strings wi be regarded as candidates and then we wi verify them and get the resuts. Agorithm 2: HS-Search (S, q, τ) Input: S: The string set q: The query string τ: The given edit-distance threshod Output: R = {(s S) ED(s, q) τ} begin 2 Cacuate the maximum eve i = og 2 (τ + ) ; 3 for = q τ to q + τ do 4 for j = to 2 i do 5 Generate substrings set W(q, L i,j ) ; 6 for w W(q, L i,j ) do 7 for s L i,j [w] do 8 N i (s, q) = N i (s, q) + ; 9 if N i (s, q) 2 i τ then if ED(s, q) τ then R = R {s}; 2 end Based on Lemma, we devise the HS-Search agorithm to support threshod-based string simiarity search and the pseudo-code is shown in Agorithm 2. HS-Search first cacuates the eve i = og 2 (τ + ). Then HS-Search utiizes ength fiter to reduce the number of visited HS-Tree: for a query string q and threshod τ, ony groups S ( q τ q + τ) are visited. Next HS-Search generates the set of substrings of q, W(q, L i,j ), for j = to 2 i (ine 5). According to Lemma, if a string s is simiar to q, s shoud appear at east 2 i τ times in the inverted ists L i,j [w] for j 2 i and w W(q, L i,j ). To this end, HS-Search checks whether w W(q, L i,j ) is in L i,j. If so, for any string s in the inverted ist L i,j [w], s and q shares a common segment w and we increase N i (s, q) by, where N i (s, q) denotes the number of matched segments between s and q in eve i(ine 8). If N i (s, q) exceeds 2 i τ, s is a candidate and HS-Search verifies candidate s (ine ). To improve the performance of HS-Search, we design efficient techniques to reduce the substring-set size W(q, L i,j ) in Section IV-B and improve the verification cost in Section IV-C. Exampe 4: Consider the string set in Tabe I. Suppose we have buit the HS-Tree index as shown in Figure 2. Assume the query string is q = brethor and the threshod is τ = 2. First, i = og 2 (τ + ) = 2. In eve 2 there are 4 segments. If ED(s, q) 2, s and q must have at east = 2 common segments. As q = 7 and min = 7, we ony need to visit the strings with ength between 7 and 9 for τ = 2. First we check L 2, 7, which contains segments { b }. String q has a matched substring b in L 2, 7. As strings s, s 2 and s 3 in the inverted ist of b share a common segment with q, we increase N 2 (s, q), N 2 (s 2, q), N 2 (s 3, q) by. Then we check L 2,2 7, which contains segments { ro }, string q has no matched substring. Simiary we check L 2,3 7 and L 2,4 7. As q has a matched substring th in L 2,3 7 with invited ist of {s, s 2 }, we increase N 2 (s, q) and N 2 (s 2, q) by. Now strings s and s 2 have two matched segments with q, so we verify them and get ED(q, s ) = 2 and ED(q, s ) = 3. We put s into the resut set. As N 2 (s 3, q) = < 2, we can safey prune s 3. Then we perform simiar procedure on L i,j 8 and L i,j 9, no more answers are found and the agorithm terminates. 522

5 Compexity: We anayze the time compexity of Agorithm 2. It consists of two parts: the fitering time and the verification time. Before performing Agorithm 2, we need to group strings by ength, which is O( max = min S ). This can be regarded as offine time and not incuded in the search time. The time to generate set W(q, L i,j ) is O(τ) as discussed in Section IV-B. The tota time of seecting substrings for [ q τ, q + τ and j [, 2 i ] is O(τ 2 ). The fitering cost is to visit the inverted ists which is [ q τ, q +τ] Li,j [w W(q, L i,j )]. The verification cost of q and s for threshod τ is O(τ min( q, s )) [4], and we wi further improve the verification cost in Section IV-C. B. Improving The Fitering Step To find the candidate of a given query q, we need to first generate the set of substrings W(q, L i,j ) and then count how many common segments between strings in S and q. To reach such goa efficienty, we reduce the fitering cost through two directions: () reduce the size of W(q, L i,j ) for j 2 i ; and (2) remove invaid segment matchings. Reduce W(q, L i,j ). It is obvious smaer W(q, L n,j ) wi ead to higher performance. Based on the position fiter, a segment in s cannot be matched with the substrings of q with arge position difference. For exampe, for s 5 = swingabe, query q= bending and threshod τ = 2. The segments of s in the second eve is correspondingy { sw, in, ga, be }. The substring be cannot be matched with the fourth segment because their position difference is arger than τ. The position fiter coud be strengthened by considering the ength difference of s and q, denoted as. Thus suppose the start position of segment w L i,j is Pos i,j and its ength is Len i,j. We can easiy get the ower bound of start positions of substrings in q, denoted as LB = max(, Pos i,j τ ), and the upper bound, denoted as UB = min( q Len i,j +, Pos i,j + τ + ). We ony need to check the substrings starting within the range [LB, UB]. Moreover, by ooking both from the eft-side and right-side perspective [4], we can further reduce the vaue of LB and UB to LB = max(, Pos i,j (j ), Pos i,j + (τ + j)) and UB = min( s Len i,j +, Pos i,j + (j ), Pos i,j + + (τ + j)). Thus W(q, L i,j ) = {q[pos i,j, Len i,j ]} where q[pos i,j, Len i,j ] is the substring of q with start position Pos i,j [LB, UB] and ength Len i,j. The correctness is stated in Lemma 2. Lemma 2: Given a query string q and a threshod τ, using the set W(q, L i,j ) = {q[pos i,j [LB, UB], Len i,j ]} to find matching candidates, our method wi not miss any resut. To identify string s with N i (s, q) 2 i τ, we use the ist-merge agorithm [] to improve the performance which utiizes a heap to efficienty identify the candidates without accessing every strings on the inverted ists. Remove Invaid Matchings. The above method identifies the candidates by simpy counting the number of matched common segments. However it is worth noting that the substrings of q matching with different segments may confict with each other, where confict means that the two matched substrings overap. This is because the segments of the data strings are disjoint and the matched substrings of q shoud aso be Agorithm 3: SEGCOUNT() Input: M i (s, q): Matched segments of s, q in eve i. Output: Number of matched segments without confict. begin 2 D[] = ; 3 for j = 2 to N i (s, q) do 4 D[j] = max <t j {γ(j, t) D[t]} + ; 5 6 return D[N i (s, q)]; end disjoint. For exampe, if q= acompany, s= accompish and threshod τ = 2. If the substring ac of q matches the first segment ac, the substring com cannot match the second segment, because ac and com are overapped in q but they are disjoint in s. If we do not eiminate such confict matching, it wi invove fase positives. For exampe, we get N 2 (s, q) = 2 in the segment count step, but string s has ony one common segment with string q. This fase positive wi resut in arger candidate size and extra verification cost. To sove this probem, we design a dynamic-programming agorithm to cacuate the maximum number of matched segments whie eiminating the confict between matched segments (removing overapped matching). Let D[j] denote the maximum number of matched segments without confict among the first j segments. To cacuate D[j], we need to find the ast matched segment t without confict for t < j, and compute the number of matched segments without confict using this segment. Then we consider whether the current matched segment conficts with the ast matched segment. We use a function γ(j, t) to judge whether two matches j and t confict: γ(j, t) = if there is no confict; γ(j, t) = otherwise. Then we can get the foowing recursion formua: {, j = D[j] = () max <t j {γ(j, t) D[t]} +, otherwise Then we devise a dynamic-programming agorithm based on the above formua as shown in Agorithm 3. The agorithm takes as input the set of matched segments between s and q in eve i, denoted as M i (s, q), which can be easiy gotten when computing N i (s, q). The agorithm outputs the number of matched segments without confict based on Equation. The time compexity of this agorithm is O(x 2 ), where x = N i (s, q). The maximum vaue of x is 2 i in eve i, but in practica cases the number of matched segments is far smaer than 2 i with the hep of efficient substring seection methods. Thus the cost of this agorithm is negigibe. We can integrate this observation into Agorithm 2 (repace ine 8 of Agorithm 2 with Agorithm 3) to enhance the pruning power. Exampe 5: Suppose string s = are accommodate to, q = were acomofortabe and τ = 5. In the segment matching step, we search in eve 3 and get M 3 (s, q) = { ac, com, mo } and N 3 (s, q) = However, when we perform Agorithm 3 on M 3 (s, q), we get D[] =, D[2] = and D[3] = 2. So there are 2 rather than 3 matched segments. As 2 < = 3, we can safey prune s. C. Improving The Verification Step In our HS-Search agorithm, after generating the candidate strings, we verify whether their rea edit distances to the 523

6 Fig. 3. The MutiExtension Verification (y =4) query are within the threshod τ. In this section we devise nove techniques to improve the verification step. To compute the edit distance between two strings q and s, a naive method is to use the dynamic-programming agorithm. Given two strings q and s, it utiizes a matrix C with q + rows and s + coumns where C[i][j] is the edit distance between the substring q[,i] and s[,j] [8]. Actuay, if we ony want to check whether the edit distance between two strings is within a given threshod τ, we can further reduce the compexity by ony computing the C[i][j] vaues for i j τ, with the cost of (2τ +) min( q, s ). A ength-aware method has been proposed to improve the time compexity to τ min( q, s ) [4]. These agorithms can aso do eary termination when the vaues in a row are a arger than τ to further improve the time compexity. For our HS-Search agorithm, we can utiize the set of matched segments in M i (s, q) to avoid unnecessary computations. As we can see from Section IV-A, given a candidate string s, y = N i (s, q). We aign q and s based on these matched segments and partition them into 2 y + parts incuding y matched segments and y + unmatched segments. We ony need to verify whether ED(q, s) τ in this aignment. Suppose the set of matched segments is M i (s, q) = {m,m 2,,m y }, and the set of unmatched segments of query q (string s) is Q = {q,q 2,,q y+ } (S = {s,s 2,,s y+ }), where q j (s j ) is the substring between m i and m i + for j [,y] and q y+ (s y+ ) is the substring after p y (s y ). If two matched segments m j and m j+ are consecutive, s j (q j )=. We denote the tota edit distance as TED = y+ i= ED(s i,q i ).IfTED is arger than τ, s is not simiar to q in this aignment and we can discard s; otherwise we add s into the resut set. We ca this method as SingeThreshod. We can further improve the performance of SingeThreshod by assigning each ED(q j,s j ) with a tighter threshod bound. As we can see, the part s j is between the segments m j and m j+. Let p j denote the order of m j among the 2 i segments in s. For string s, s j consists of p j+ p j segments (for j =, the vaue is p and for j = y +, the vaue is 2 i p y ). For a given threshod τ, ify is exacty 2 i τ, the τ edit operations must be distributed in each segment according to the pigeon hoe principe. In this case, if we find Agorithm 4: MutiExtension (s, q, τ, M i (s, q)) Input: s, q: the strings to be verified τ: the given threshod M i (s, q): matched segments of s, q in eve i. Output: The verification resut begin 2 Generate sets S and Q based on M i (s, q); 3 Generate threshods based on M i (s, q); 4 for j =to N i (s, q)+ do 5 Cacuate ED(q j,s j ) with ength-aware method; 6 if ED(q j,s j ) >τ j then return; 7 Add s into the resut set; 8 end two consecutive errors in one segment (or in other words, more than τ j errors appear in part j), we can safey terminate the verification step. The threshod τ j of each part j is cacuated as foows: τ j = p, j = 2 i p y, j = y + p j+ p j, otherwise We propose an eary termination technique (Lemma 3). Lemma 3: Consider two strings s and q with exacty y = 2 i τ common segments. If ED(q j,s j ) >τ j, ED(q, s) >τ. Based on Lemma 3, we can devise the verification agorithm MutiExtension as shown in Agorithm 4. In this agorithm, we can use the ength-aware method to efficienty verify ED(q j,s j ). Next we wak through the two verification agorithms SingeThreshod and MutiExtension using the foowing exampe as shown in Figure 3. Exampe 6: Consider two strings s 7 = abna evina in Tabe II and q= ovner oevi, and the threshod is 5. We perform HS-Search on eve 3 with 8 segments. The segments of s on eve 3 are respectivey { a, b, n, a,, ev, i, na }. As we can see from Figure 3, s and q have matched segments m = n,m 2 =,m 3 = ev. p = 3, p 2 = 5, p 3 = 6. Then we can generate the set of S={s = ab,s 2 = a,s 3 =,s 4 = ina } and Q={q = ov,q 2 = er,q 3 = o,q 4 = i }. ForMutiExtension, the threshods of each part are respectivey 2,,,2. Then we cacuate ED(s,q )=2 2, ED(s 2,q 2 )=2>, then MutiExtension terminates and discards s. But SingeThreshod wi continue to cacuate ED(s 3,q 3 )=, ED(s 4,q 4 )=2 and TED = = 7 > 6, then it discards s. Thus MutiExtension outperforms SingeThreshod. V. TOP-K SIMILARITY SEARCH In this section, we study the top-k simiarity search probem. Different from the threshod-based simiarity search, the top-k simiarity search has no fixed threshod. Athough we can extend the threshod-based method to find top-k answers by enumerating threshods incrementay unti k resuts found, this agorithm is rather expensive because it executes mutipe (unnecessary) search operations for each threshod and invoves many dupicated computations. To address this issue, we devise an efficient agorithm HS-Topk to support top-k simiarity search using our HS-Tree index. The basic idea is to first access the promising strings with arge possibiity (2) 524

7 to be simiar to the query so as to prune arge numbers of dissimiar strings effectivey. To this end, we first propose a batch-pruning-based method in Section V-A and then present an effective pruning strategy in Section V-B. Finay we devise our HS-Topk agorithm in Section V-C. A. The Batch-Pruning-Based Method We maintain a priority queue Q to keep the current k promising resuts. Let UB Q denote the argest edit distance between the strings in Q to the query, i.e., UB Q = max{ed(s Q, q)}. Obviousy UB Q is an upper bound of the edit distances of top-k resuts to the query. In other words, we can prune a string if its edit distance to the query is arger than UB Q. Next we discuss how to utiize the queue to find top-k answers. Given a query q, we sti access the HS-Tree in a top-down manner. Consider the i th eve of the HS-Tree with ength. For each node n i,j, we generate the substrings W(q, L i,j ) for j 2 i and identify the corresponding inverted ists L i,j. We group the strings in the inverted ists based on the number of substrings they contain in W(q, L i,j ). Let B x denote the group of strings containing x substrings. As each string contains at most 2 i segments, there are at most 2 i groups. For strings in B x, they share x common segments with the query. If 2 i < UB Q, for x [, 2 i ] a strings in B x can be regarded as candidates. Otherwise, there are 2 i x mismatch segments for strings in B x. As a mismatch segment eads to at east edit error, and thus the ower bound of the edit distances of strings in B x to query q is LB Bx = 2 i x. Obviousy if LB Bx UB Q, we can prune the strings in B x based on Lemma 4. In other words, we ony need to visit the groups such that x 2 i UB Q. On the other hand, the arger x is, the strings in B x have arger possibiity in the top-k answers. Thus, we want to first access the strings in groups with arger x. These two observations motivate us devise a batch-pruning-based method. Lemma 4: If LB Bx UB Q, strings in B x can be pruned. For eve i, if 2 i UB Q +, we can terminate because we have found a top-k answers within threshod UB Q using the nodes in the first i eves; otherwise, we retrieve the inverted ists of substrings in W(q, L i,j ) from the i th eve and identify the substrings with N i (s, q) 2 i UB Q from these ists. Next we group the strings into B x (x [2 i UB Q, 2 i ]) based on the number of matched segments. Then, we visit the groups based on the number x in descending order. For each string s B x, we compute the rea edit distance between s and q. If ED(s, q) < UB Q, we update the priority queue Q and UB Q using s. Iterativey, we can correcty find the top-k answers. Obviousy this batch-pruning-based method not ony reduces the fitering cost, because we ony need to do segment counting once for a eve i whie we need to perform 2 i times for each threshod using the threshod-based method, but aso the verification time, because we can use a tighter bound UB Q to do verification. Exampe 7: Consider a top-2 query q = breahers on the dataset in Tabe I. Suppose string s 4 = breathes is aready in Q as it has a common segment brea in eve with q. ED(q, s 4 ) = 2. UB Q =. Then, consider Leve i Fig. 4. HS-Tree # matched segments i argerthan2-ub Q The Batch-Pruning-Based Method Sorted B 2 i... B 2 i-ub Q Groups s ed(s,q) < UB Q UB Q Q update the HS-Tree for strings with ength 7 (S 7 ) in Figure 2. We start from eve. As there is no matched segment, we move to eve 2. There are two matched segments br, er. Their inverted ists are {s } and {s, s 2, s 3 } where s = brother, s 2 = brothe, s 3 = breathe. we have N 2 (s, q) = 2, N 2 (s 2, q) = and N 2 (s 3, q) =. We put s into B 2 and verify B 2. As ED(q, s )=3 < UB Q, we add s into Q and update UB Q = 3. As 2 2 UB Q +, the agorithm is terminated and strings in B = {s 2, s 3 } are pruned. B. The Greedy-Match Strategy The batch-pruning-based method can effectivey prune strings without enough common segments. If each mismatch segment ony contains one edit error, this method is very effective as it can effectivey estimate the ower bound. However, if one mismatch segment invoves more than one consecutive errors, the estimation is not accurate, and this method fais to fiter such candidates. For exampe, consider query q = broader on the dataset in Tabe I. For UB Q =, string s = brother and s 2 = brothe can pass the segment fiter as they share a common segment bro, but it is obvious that their edit distances to q are arger than as the second segment contains 3 errors. To address this issue, we devise a greedy-match strategy to prune strings with consecutive errors by utiizing our hierarchica index structure. Consider a string s in eve i with N i (s, q) 2 i UB Q. In this case, we cannot prune s. Instead of directy verifying string s, we go to the next eve i + and estimate a tighter bound by counting the number of matched segments in eve i + (i.e., N i+ (s, q)). If the number is smaer than 2 i+ UB Q, we can prune string s based on Lemma 4. If the string is not pruned in eve i+. We check the eve i+2. Iterativey, if the string is sti not pruned in the eaf eve, we wi compute the rea edit distance based on the method in Section IV-C. It is worth noting that the arger the eve is, the shorter a segment is, and the higher probabiity that those dissimiar strings with consecutive errors can be pruned. Next we discuss how to efficienty compute the number of matched segments between s and q. A naive method enumerates each segment of s and checks whether it appears as a substring of q. This method shoud enumerate many segments. Aternativey, we propose an effective method. Based on the characteristics of the HS-Tree, if the j th segment in eve i matches a substring of q, the 2 j th and 2 j th segments must match two substrings of q in eve i +. Thus we do not need to check them again. Thus, we ony need to check the mismatch segments in eve i. 525

8 Agorithm 5: GREEDYMATCH (s, q, i, τ) Input: s, q: the strings to be verified i: eve; τ: the current threshod M i (s, q): matched segments of s, q in eve i. Output: True or Fase begin 2 for r = i + to n do 3 for segments w M r (s, q) do 4 put w s two subsegments into M r (s, q); for j = to 2 r do if segment j / M r (s, q) then check the segment j; if find a matched substring then put segment j into M r (s, q); if N r (s, q) < 2 r τ then return Fase; return True; end The pseudo-code of the greedy-match strategy is shown in Agorithm 5. It gets the set of matched segments M i (s, q) in current eve i and then use M i (s, q) to generate M i+ (s, q). This iterativey-matching procedure for different eves wi not invove heavy fitering cost, because segment j in eve i corresponds to segments 2 j and 2 j in eve i+ and such matched segments can be passed down to ower eves. Besides, as we have generated the substrings W(q, L r,j s ) for each eve r ( r n), when we ook for a matched substring for segment j, we just check W(q, L r,j s ) and do not need to scan inverted ists any more. Exampe 8: Consider s 8 = christopher swenson and query q= atrmstophbwcmrense. The ength of s 8 is 9, so there are 4 eves. Suppose the current threshod UB Q is 3. It requires = matched segments in eve 2, = 5 segments in eve 3 and = 3 segments in eve 4. We find a matched segments stoph in eve 2. Instead of verification, here we continue to ook for another 5 2 = 3 matched segment in eve 3. We find a matched segment en in eve 3. As there are totay 3 matched segments in eve 3, which is smaer than the required number of matched segment 5, s 8 wi be pruned and we do not need to compute its rea edit distance to q. C. The HS-Topk Agorithm We combine the batch-pruning-based method and greedymatch strategy together and devise a top-k simiarity search agorithm HS-Topk. The pseudo-code is shown in Agorithm 6. It first initiaizes the queue Q and sets the threshod UB Q = (ine 2). Then it searches the HS-Tree from the root (ine 3). If the vaue of UB Q is no arger than the minimum threshod supported by current eve (2 i UB Q + ), the agorithm terminates and we can safey prune remainder strings (ine 4). For each eve, we ony visit HS-Tree with engths between q UB Q and q + UB Q based on ength fitering (ine 5). For each HS-Tree, it identifies the matched segments, groups strings with different numbers of matched segments into different groups, and visits the group sorted by the number Agorithm 6: HS-Topk (S, q, k) Input: q: the query string; k: the size of resut set S: The string set Output: R: the top-k answer begin 2 Initiaize queue Q and UB Q ; 3 for i = to L do 4 if 2 i UB Q + then return; 5 for [ q UB Q, q + UB Q ] do 6 Identify strings with occurrence number arger than 2 i UB Q and group them to B x ; 7 for x = 2 i to 2 i UB Q do 8 for each string s B x do 9 if GREEDYMATCH(s, q, i, 2 i UB Q ) then Verify ED (s,q); if ED(s, q) < UB Q then Update Q and UB Q ; 2 3 Fig. 5. end christoph erswenson LV chri stoph er sw enson LV 2 Match Greedy q=atrmstophbwcmrense q=atrmstophbwcmrense ch ri st oph er sw en son LV 3 An Exampe for Greedy-Match Strategy in ascending order (ine 6). For each string in the current group B x, we perform the greedy-match strategy (ine 9). If the string passes the fiter, we verify the candidate using threshod UB Q based on the techniques in Section IV-C (ine ). If ED(s, q) < UB Q, we use s to update Q and UB Q (ine 2). VI. EXPERIMENTAL STUDY In this section, we conduct an extensive set of experiments and our experimenta goa is to evauate the efficiency of our agorithms and compare with state-of-the-art methods. A. Experiment Setup We used three pubicy avaiabe rea datasets in our experiments: DBLP Author, DBLP pubication records and Query Log 2, which are widey used in previous studies [9]. The detais of data sets are shown in Tabe II. Author contains short strings, Query Log contains medium-ength strings, and DBLP contains ong strings. We compared our agorithms with state-of-the-art methods. For threshod-based simiarity search, we compared our HS- ey/db/

9 TABLE II. DATASETS Datasets # Avg Len Max Len Min Len Author 62, Query Log 464, DBLP,385, Search agorithm with Adapt [2], QChunk[6] and B ed - tree[27]. Athough there are some other simiarity search agorithms, e.g., Famingo[] and VChunk [2], it has been shown in the previous studies that both Adapt and QChunk outperformed them [2], [6]. So we ony compared HS- Search with Adapt and QChunk. For top-k simiarity search, we compared our HS-Topk agorithm with Range [5], B ed - tree and AQ[26]. We obtained the source code of Adapt, Range, B ed -tree from the authors and impemented QChunk and AQ by ourseves [9]. A the agorithms were impemented in C++ and compied using GCC with -O3 fag. A the experiments were run on a Ubuntu server machine with two Inte Xeon E542 CPUs (8 cores, 2.5GHz) and 32GB memory. B. Evauation on Threshod-based Search ) Evauating Different Verification Agorithms: We first evauated the verification methods. We impemented three methods Length-aware, SingeThreshod and MutiExtension. Length-aware is the ength-aware method [4]; SingeThreshod is the method that has ony one threshod τ; MutiExtension is our method that has separate threshods for each matched part based on Lemma 3. A the three methods were impemented with eary-termination techniques. Figure 6 shows the resuts by varying edit-distance threshods on the three datasets. We can observe that SingeThreshod invoves ess verification time than Length-aware because it can avoid dupicated computations on aready matched segments and divided the two strings into different parts. For each part, MutiExtension has a different threshod and wi terminate as soon as the edit distance of one part is arger than the given threshod of that part. Thus comparing with SingeThreshod, MutiExtension wi terminate earier.for exampe, on the Query Log dataset, for τ = 5, Length-aware took 55 miiseconds on average, and SingeThreshod decreased the time to 39 miiseconds,whie MutiExtension further reduced it to 24 miiseconds. 2) Fiter Time vs Verification Time: Next we evauated the cost of the fiter step and the verification step and the resut is show in Figure 7. We can see that our segment fiter has great fitering power for sma threshods, and a arge number of dissimiar strings can be pruned in the fiter step, so the verification time is further reduced. For arge threshods e.g., τ = 5 and τ = 2 in Figure 7(c), the verification time is dominant in the overa time. This is because when the threshod becomes arge, we need to do segment matching in ower eves and the segments woud be much shorter. It is obvious shorter segments have more chance to be matched, so the number of candidates is arger than that of sma threshods. 3) Comparison with state-of-the-art methods: We compared our HS-Search agorithm with state-of-the-art agorithms Adapt,QChunk and B ed -tree by varying different editdistance threshods on the three datasets Author, Query Log and DBLP. Figure 8 shows the resuts. We can see that HS- Search achieved the best performance on a the datasets and outperformed existing agorithms by 3 to 2 times. For exampe, on the Query Log dataset for τ =, HS-Search took 6 miiseconds. And the average search time for Adapt, TABLE III. THRESHOLD-BASED SIMILARITY SEARCH: INDEX Dataset Method Index Size(MB) Index Construction Time(s) HS-Search Author Adapt QChunk B ed -tree HS-Search Query Adapt Log QChunk 72.7 B ed -tree HS-Search DBLP Adapt QChunk B ed -tree QChunk and B ed -tree were 32, 5 and 4 miiseconds respectivey. Among a the baseines, the overa performance of B ed -tree was the worst because it had poor fitering power to prune dissimiar strings. Adapt had better performance than QChunk because Adapt took advantage of the adaptive prefix ength to reduce a arge number of candidates. Our method achieved the best performance for the foowing reasons. First, existing agorithms are based on n- grams, and our method is based on segments which have much stronger fitering power than gram-based methods (as segments are onger than n-grams). Moreover, the segments are seected across the string and not restricted to the prefix. Thus, our method aways generates the east number of candidates. Second, comparing with other simiarity search agorithms we aso design efficient verification mechanism. In this way, we can take advantage of the resuts of fiter step and avoid redundant computations. Third, since previous agorithms are based on n-grams, they need to tune the parameter n for different datasets even for different threshods on the same dataset to achieve the best performance. Our HS-Search agorithm does not need any parameter tuning. So the utiity of HS-Search is much better than the existing agorithms. In addition, we compared the index construction time and index size and the resut is shown in Tabe III. We have the foowing observations. First, HS-Search had the east index time because it can iterativey divide the segments in different eves and do not need to buid a arge inverted index ike n-gram based methods. Second, the index size of HS-Tree is neary the same with state-of-the-art n-grambased methods, because HS-Tree partitions each string with ength into disjoint segments in each eve with totay og = O() segments; and n-gram based methods generate n + grams. Thus they generate simiar number of n-grams/segments and thus the index sizes are aso simiar. In addition, in the inverted ists, for a segment, HS-Tree ony needs to maintain the string containing it but n-gram-based methods aso need to store the position of n-gram, so our index size can be smaer than state-of-the-art methods. Third, B ed -tree organizes simiar strings into a B-tree node based on specific orders and does not need to maintain inverted ists, thus its index size is smaer. 4) Scaabiity: We evauated the scaabiity of HS-Search. We varied the number of strings in each dataset and tested the average search time. Figure 9 shows the resut on the three datasets. We can see that as the size of a dataset increased, our method scaed very we for different edit-distance threshods and achieved near inear scaabiity. For exampe, on the DBLP data set, when the threshod was 2, the average search time for 4, strings, 5, strings and 6, strings were respectivey 2, 27, and 33 miiseconds. 527

MassJoin: A MapReduce-based Method for Scalable String Similarity Joins

MassJoin: A MapReduce-based Method for Scalable String Similarity Joins MassJoin: A MapReduce-based Method for Scaabe String Simiarity Joins Dong Deng, Guoiang Li, Shuang Hao, Jiannan Wang, Jianhua Feng Department of Computer Science, Tsinghua University, Beijing, China {dd11,

More information

MassJoin: A MapReduce-based Method for Scalable String Similarity Joins

MassJoin: A MapReduce-based Method for Scalable String Similarity Joins MassJoin: A MapReduce-based Method for Scaabe String Simiarity Joins Dong Deng, Guoiang Li, Shuang Hao, Jiannan Wang, Jianhua Feng, Wen-Syan Li Department of Computer Science, Tsinghua University, Beijing,

More information

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness 1 Scheduabiity Anaysis of Deferrabe Scheduing Agorithms for Maintaining Rea-Time Data Freshness Song Han, Deji Chen, Ming Xiong, Kam-yiu Lam, Aoysius K. Mok, Krithi Ramamritham UT Austin, Emerson Process

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Pairwise RNA Edit Distance

Pairwise RNA Edit Distance Pairwise RNA Edit Distance In the foowing: Sequences S 1 and S 2 associated structures P 1 and P 2 scoring of aignment: different edit operations arc atering arc removing 1) ACGUUGACUGACAACAC..(((...)))...

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance Efficient Simiarity Search across Top-k Lists under the Kenda s Tau Distance Koninika Pa TU Kaisersautern Kaisersautern, Germany pa@cs.uni-k.de Sebastian Miche TU Kaisersautern Kaisersautern, Germany smiche@cs.uni-k.de

More information

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness 1 Scheduabiity Anaysis of Deferrabe Scheduing Agorithms for Maintaining Rea- Data Freshness Song Han, Deji Chen, Ming Xiong, Kam-yiu Lam, Aoysius K. Mok, Krithi Ramamritham UT Austin, Emerson Process Management,

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Recursive Constructions of Parallel FIFO and LIFO Queues with Switched Delay Lines

Recursive Constructions of Parallel FIFO and LIFO Queues with Switched Delay Lines Recursive Constructions of Parae FIFO and LIFO Queues with Switched Deay Lines Po-Kai Huang, Cheng-Shang Chang, Feow, IEEE, Jay Cheng, Member, IEEE, and Duan-Shin Lee, Senior Member, IEEE Abstract One

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Math 124B January 31, 2012

Math 124B January 31, 2012 Math 124B January 31, 212 Viktor Grigoryan 7 Inhomogeneous boundary vaue probems Having studied the theory of Fourier series, with which we successfuy soved boundary vaue probems for the homogeneous heat

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

The Sorting Problem. Inf 2B: Sorting, MergeSort and Divide-and-Conquer. What is important? Insertion Sort

The Sorting Problem. Inf 2B: Sorting, MergeSort and Divide-and-Conquer. What is important? Insertion Sort The Sorting Probem Inf 2B: Sorting, MergeSort and Divide-and-Conquer Lecture 7 of DS thread Kyriaos Kaoroti Input: Tas: rray of items with comparabe eys. Sort the items in by increasing eys. Schoo of Informatics

More information

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES ANDRZEJ DUDEK AND ANDRZEJ RUCIŃSKI Abstract. For positive integers k and, a k-uniform hypergraph is caed a oose path of ength, and denoted by

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

The Binary Space Partitioning-Tree Process Supplementary Material

The Binary Space Partitioning-Tree Process Supplementary Material The inary Space Partitioning-Tree Process Suppementary Materia Xuhui Fan in Li Scott. Sisson Schoo of omputer Science Fudan University ibin@fudan.edu.cn Schoo of Mathematics and Statistics University of

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Unconditional security of differential phase shift quantum key distribution

Unconditional security of differential phase shift quantum key distribution Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice

More information

MULTI-PERIOD MODEL FOR PART FAMILY/MACHINE CELL FORMATION. Objectives included in the multi-period formulation

MULTI-PERIOD MODEL FOR PART FAMILY/MACHINE CELL FORMATION. Objectives included in the multi-period formulation ationa Institute of Technoogy aicut Department of echanica Engineering ULTI-PERIOD ODEL FOR PART FAILY/AHIE ELL FORATIO Given a set of parts, processing requirements, and avaiabe resources The objective

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage Magnetic induction TEP Reated Topics Maxwe s equations, eectrica eddy fied, magnetic fied of cois, coi, magnetic fux, induced votage Principe A magnetic fied of variabe frequency and varying strength is

More information

Minimum Spanning Trees in Temporal Graphs

Minimum Spanning Trees in Temporal Graphs Minimum Spanning Trees in Tempora Graphs Siu Huang Chinese University of Hong Kong shuang@cse.cuhk.edu.hk Ada Wai-Chee Fu Chinese University of Hong Kong adafu@cse.cuhk.edu.hk Ruifeng Liu Chinese University

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various

More information

An Efficient Partition Based Method for Exact Set Similarity Joins

An Efficient Partition Based Method for Exact Set Similarity Joins An Efficient Partition Based Method for Exact Set Similarity Joins Dong Deng Guoliang Li He Wen Jianhua Feng Department of Computer Science, Tsinghua University, Beijing, China. {dd11,wenhe1}@mails.tsinghua.edu.cn;{liguoliang,fengjh}@tsinghua.edu.cn

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Coded Caching for Files with Distinct File Sizes

Coded Caching for Files with Distinct File Sizes Coded Caching for Fies with Distinct Fie Sizes Jinbei Zhang iaojun Lin Chih-Chun Wang inbing Wang Department of Eectronic Engineering Shanghai Jiao ong University China Schoo of Eectrica and Computer Engineering

More information

CONSTRUCTION AND APPLICATION BASED ON COMPRESSING DEPICTION IN PROFILE HIDDEN MARKOV MODEL

CONSTRUCTION AND APPLICATION BASED ON COMPRESSING DEPICTION IN PROFILE HIDDEN MARKOV MODEL CONSRUCON AND APPCAON BASED ON COPRESSNG DEPCON N PROFE HDDEN ARKOV ODE Zhijian Zhou, 1, Daoiang i, 3, 3, i i,3,*, Zetian Fu 1 Coege of Science, China Agricuture University, Beijing, China, 100083 Key

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

A Robust Voice Activity Detection based on Noise Eigenspace Projection

A Robust Voice Activity Detection based on Noise Eigenspace Projection A Robust Voice Activity Detection based on Noise Eigenspace Projection Dongwen Ying 1, Yu Shi 2, Frank Soong 2, Jianwu Dang 1, and Xugang Lu 1 1 Japan Advanced Institute of Science and Technoogy, Nomi

More information

BDD-Based Analysis of Gapped q-gram Filters

BDD-Based Analysis of Gapped q-gram Filters BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de

More information

Efficient Parallel Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints

Efficient Parallel Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints Efficient Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints Yu Jiang,, Jiannan Wang, Guoliang Li, and Jianhua Feng Tsinghua University Similarity Search&Join Competition

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

Formulas for Angular-Momentum Barrier Factors Version II

Formulas for Angular-Momentum Barrier Factors Version II BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Traffic data collection

Traffic data collection Chapter 32 Traffic data coection 32.1 Overview Unike many other discipines of the engineering, the situations that are interesting to a traffic engineer cannot be reproduced in a aboratory. Even if road

More information

Unit 48: Structural Behaviour and Detailing for Construction. Deflection of Beams

Unit 48: Structural Behaviour and Detailing for Construction. Deflection of Beams Unit 48: Structura Behaviour and Detaiing for Construction 4.1 Introduction Defection of Beams This topic investigates the deformation of beams as the direct effect of that bending tendency, which affects

More information

<C 2 2. λ 2 l. λ 1 l 1 < C 1

<C 2 2. λ 2 l. λ 1 l 1 < C 1 Teecommunication Network Contro and Management (EE E694) Prof. A. A. Lazar Notes for the ecture of 7/Feb/95 by Huayan Wang (this document was ast LaT E X-ed on May 9,995) Queueing Primer for Muticass Optima

More information

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008 Random Booean Networks Barbara Drosse Institute of Condensed Matter Physics, Darmstadt University of Technoogy, Hochschustraße 6, 64289 Darmstadt, Germany (Dated: June 27) arxiv:76.335v2 [cond-mat.stat-mech]

More information

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE KATIE L. MAY AND MELISSA A. MITCHELL Abstract. We show how to identify the minima path network connecting three fixed points on

More information

Approximate Bandwidth Allocation for Fixed-Priority-Scheduled Periodic Resources (WSU-CS Technical Report Version)

Approximate Bandwidth Allocation for Fixed-Priority-Scheduled Periodic Resources (WSU-CS Technical Report Version) Approximate Bandwidth Aocation for Fixed-Priority-Schedued Periodic Resources WSU-CS Technica Report Version) Farhana Dewan Nathan Fisher Abstract Recent research in compositiona rea-time systems has focused

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

A Branch and Cut Algorithm to Design. LDPC Codes without Small Cycles in. Communication Systems

A Branch and Cut Algorithm to Design. LDPC Codes without Small Cycles in. Communication Systems A Branch and Cut Agorithm to Design LDPC Codes without Sma Cyces in Communication Systems arxiv:1709.09936v1 [cs.it] 28 Sep 2017 Banu Kabakuak 1, Z. Caner Taşkın 1, and Ai Emre Pusane 2 1 Department of

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Minimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch

Minimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch The Eighth Internationa Symposium on Operations Research and Its Appications (ISORA 09) Zhangiaie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 402 408 Minimizing Tota Weighted Competion

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

17 Lecture 17: Recombination and Dark Matter Production

17 Lecture 17: Recombination and Dark Matter Production PYS 652: Astrophysics 88 17 Lecture 17: Recombination and Dark Matter Production New ideas pass through three periods: It can t be done. It probaby can be done, but it s not worth doing. I knew it was

More information

Torsion and shear stresses due to shear centre eccentricity in SCIA Engineer Delft University of Technology. Marijn Drillenburg

Torsion and shear stresses due to shear centre eccentricity in SCIA Engineer Delft University of Technology. Marijn Drillenburg Torsion and shear stresses due to shear centre eccentricity in SCIA Engineer Deft University of Technoogy Marijn Drienburg October 2017 Contents 1 Introduction 2 1.1 Hand Cacuation....................................

More information

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes Maima and Minima 1. Introduction In this Section we anayse curves in the oca neighbourhood of a stationary point and, from this anaysis, deduce necessary conditions satisfied by oca maima and oca minima.

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Candidate Number. General Certificate of Education Advanced Level Examination June 2010

Candidate Number. General Certificate of Education Advanced Level Examination June 2010 Centre Number Surname Candidate Number For Examiner s Use Other Names Candidate Signature Examiner s Initias Genera Certificate of Education Advanced Leve Examination June 2010 Question 1 2 Mark Physics

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channes arxiv:cs/060700v1 [cs.it] 6 Ju 006 Chun-Hao Hsu and Achieas Anastasopouos Eectrica Engineering and Computer Science Department University

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

Efficient Generation of Random Bits from Finite State Markov Chains

Efficient Generation of Random Bits from Finite State Markov Chains Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion : An Efficient Matching-Based Method for Error-Tolerant Autocompletion Dong Deng Guoliang Li He Wen H. V. Jagadish Jianhua Feng Department of Computer Science, Tsinghua National Laboratory for Information

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

New Efficiency Results for Makespan Cost Sharing

New Efficiency Results for Makespan Cost Sharing New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context

More information

Cindy WANG CNNIC 5 Nov. 2009, Beijing

Cindy WANG CNNIC 5 Nov. 2009, Beijing Cindy WANG (wangxin@cnnic.cn), CNNIC 5 Nov. 009, Beijing Agenda Motivation What we want to earn? How we do it? Concusions and Future work Discussions Agenda Motivation What we want to earn? How we do it?

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

Toward the Practical Use of Network Tomography for Internet Topology Discovery

Toward the Practical Use of Network Tomography for Internet Topology Discovery This fu text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for pubication in the IEEE INFOCOM 010 proceedings This paper was presented as part of the main

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

Integrating Factor Methods as Exponential Integrators

Integrating Factor Methods as Exponential Integrators Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been

More information

A Transformation-based Framework for KNN Set Similarity Search

A Transformation-based Framework for KNN Set Similarity Search SUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 A Transformation-based Framewor for KNN Set Similarity Search Yong Zhang Member, IEEE, Jiacheng Wu, Jin Wang, Chunxiao Xing Member, IEEE

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

IN MODERN embedded systems, the ever-increasing complexity

IN MODERN embedded systems, the ever-increasing complexity IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 2 Muticore Mixed-Criticaity Systems: Partitioned Scheduing and Utiization Bound Jian-Jun Han, Member,

More information

Discrete Applied Mathematics

Discrete Applied Mathematics Discrete Appied Mathematics 159 (2011) 812 825 Contents ists avaiabe at ScienceDirect Discrete Appied Mathematics journa homepage: www.esevier.com/ocate/dam A direct barter mode for course add/drop process

More information

Copyright information to be inserted by the Publishers. Unsplitting BGK-type Schemes for the Shallow. Water Equations KUN XU

Copyright information to be inserted by the Publishers. Unsplitting BGK-type Schemes for the Shallow. Water Equations KUN XU Copyright information to be inserted by the Pubishers Unspitting BGK-type Schemes for the Shaow Water Equations KUN XU Mathematics Department, Hong Kong University of Science and Technoogy, Cear Water

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers Ant Coony Agorithms for Constructing Bayesian Muti-net Cassifiers Khaid M. Saama and Aex A. Freitas Schoo of Computing, University of Kent, Canterbury, UK. {kms39,a.a.freitas}@kent.ac.uk December 5, 2013

More information

SEMINAR 2. PENDULUMS. V = mgl cos θ. (2) L = T V = 1 2 ml2 θ2 + mgl cos θ, (3) d dt ml2 θ2 + mgl sin θ = 0, (4) θ + g l

SEMINAR 2. PENDULUMS. V = mgl cos θ. (2) L = T V = 1 2 ml2 θ2 + mgl cos θ, (3) d dt ml2 θ2 + mgl sin θ = 0, (4) θ + g l Probem 7. Simpe Penduum SEMINAR. PENDULUMS A simpe penduum means a mass m suspended by a string weightess rigid rod of ength so that it can swing in a pane. The y-axis is directed down, x-axis is directed

More information

Seasonal Time Series Data Forecasting by Using Neural Networks Multiscale Autoregressive Model

Seasonal Time Series Data Forecasting by Using Neural Networks Multiscale Autoregressive Model American ourna of Appied Sciences 7 (): 372-378, 2 ISSN 546-9239 2 Science Pubications Seasona Time Series Data Forecasting by Using Neura Networks Mutiscae Autoregressive Mode Suhartono, B.S.S. Uama and

More information

In-plane shear stiffness of bare steel deck through shell finite element models. G. Bian, B.W. Schafer. June 2017

In-plane shear stiffness of bare steel deck through shell finite element models. G. Bian, B.W. Schafer. June 2017 In-pane shear stiffness of bare stee deck through she finite eement modes G. Bian, B.W. Schafer June 7 COLD-FORMED STEEL RESEARCH CONSORTIUM REPORT SERIES CFSRC R-7- SDII Stee Diaphragm Innovation Initiative

More information

Lower Bounds for the Relative Greedy Algorithm for Approximating Steiner Trees

Lower Bounds for the Relative Greedy Algorithm for Approximating Steiner Trees This paper appeared in: Networks 47:2 (2006), -5 Lower Bounds for the Reative Greed Agorithm for Approimating Steiner Trees Stefan Hougard Stefan Kirchner Humbodt-Universität zu Berin Institut für Informatik

More information

Lecture 9. Stability of Elastic Structures. Lecture 10. Advanced Topic in Column Buckling

Lecture 9. Stability of Elastic Structures. Lecture 10. Advanced Topic in Column Buckling Lecture 9 Stabiity of Eastic Structures Lecture 1 Advanced Topic in Coumn Bucking robem 9-1: A camped-free coumn is oaded at its tip by a oad. The issue here is to find the itica bucking oad. a) Suggest

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law Gauss Law 1. Review on 1) Couomb s Law (charge and force) 2) Eectric Fied (fied and force) 2. Gauss s Law: connects charge and fied 3. Appications of Gauss s Law Couomb s Law and Eectric Fied Couomb s

More information

A simple reliability block diagram method for safety integrity verification

A simple reliability block diagram method for safety integrity verification Reiabiity Engineering and System Safety 92 (2007) 1267 1273 www.esevier.com/ocate/ress A simpe reiabiity bock diagram method for safety integrity verification Haitao Guo, Xianhui Yang epartment of Automation,

More information

Centralized Coded Caching of Correlated Contents

Centralized Coded Caching of Correlated Contents Centraized Coded Caching of Correated Contents Qianqian Yang and Deniz Gündüz Information Processing and Communications Lab Department of Eectrica and Eectronic Engineering Imperia Coege London arxiv:1711.03798v1

More information