BDD-Based Analysis of Gapped q-gram Filters
|
|
- Silas Daniel
- 5 years ago
- Views:
Transcription
1 BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de e-mai: fontaine@studcs.uni-sb.de 2 Department of Computer Science P.O.Box 68 (Gustaf Häströmin katu 2 B) FI-4 University of Hesinki, Finand e-mai: Juha.Karkkainen@cs.hesinki.fi Abstract. Recenty, there has been a surge of interest in gapped q-gram fiters for approximate string matching. Important design parameters for fiters are for exampe the vaue of q, the fiter-threshod and in particuar the shape (aka seed) of the fiter. A good choice of parameters can improve the performance of a q-gram fiter by orders of magnitude and optimising these parameters is a nontrivia combinatoria probem. We describe a new method for anaysing gapped q-gram fiters. This method is simpe and generic. It appies to a variety of fiters, overcomes many restrictions that are present in existing agorithms and can easiy be extended to new fiter variants. To impement our approach, we use an extended version of BDDs (Binary Decision Diagrams), a data structure that efficienty represents sets of bit-strings. In a second step, we define a new cass of muti-shape fiters and anayse these fiters with the BDD-based approach. Experiments show that muti-shape fiters can outperform the best singe-shape fiters, which are currenty in use, in many aspects. The BDD-based agorithm is crucia for the design and anaysis of these new and better muti-shape fiters. Our resuts appy to the k-mismatches probem, i.e. approximate string matching with Hamming distance. Introduction String matching invoves searching a given string or textua database T for occurrences of substrings that match a search pattern P. The approximate string matching probem aows the search pattern and the matches to have some difference or distance according to a given distance function. Many appications depend on efficient soutions of this probem, especiay in the fied of bio-informatics, where databases may consist of sequences of 9 nuceotides of DNA or of ong sequences of amino acids. This work was conducted in part at the MPI for computer science, Saarbrücken with support from the Future and Emerging Technoogies programme of the EU under contract number IST (ALCOM-FT) and at the University of Hesinki supported by the Academy of Finand grant
2 BDD-Based Anaysis of Gapped q-gram Fiters Fiter agorithms are a common approach for approximate string matching. They speedup string matching by quicky generating a set of potentia matches and discarding the rest of the database. The true matches can then be found in a second step, the verification phase, by inspecting a potentia matches. Designing a good fiter usuay means optimising the tradeoff between the compexity of the fitration phase and the efficiency of the fiter. Many efficient fiters work with precomputed indexes, in particuar, indexes that are based on gapped q-grams or shapes. For exampe the three 3-grams of string ACAGCT for shape ##-# are AC-G,CA-C and AG-T. A matching pair of q-grams between a pattern and a substring of T is caed a hit. The q-gram index stores the positions of a q-grams of the database and aows to find hits efficienty. If the number of hits between the search pattern and a substring of the database exceeds a certain threshod t, that substring is caed a potentia match. As first shown in [6, 7], the performance of the fiter depends cruciay on the shape. Good shapes are found by anaysing arge sets of shapes, because no method for directy generating good shapes has been found yet. Even anaysing a singe shape is non-trivia and a ot of effort has gone into deveoping methods for this purpose. Recenty gapped shapes have been the focus of quite a bit of attention [6, 9,, 2]. In [6, 7], Burkhardt and Kärkkäinen compute the optima threshod. It is the highest threshod that sti aows the fiter to return a true matches, i.e., substrings that are within a fixed Hamming distance from the pattern. They aso compute a measure caed the minimum coverage, which provides a rough estimate of how many fase matches get through the fiter. The true positive rates and the fase positive rates were determined experimentay for seected shapes. If the threshod or Hamming distance is increased, the fiter aso discards some true matches, which are then caed fase negatives. In [5] an abstract measure for the fase negative probabiity, the so-caed recognition rate was defined and anaysed experimentay. The exact computation of both fase positive and fase negative rates was done by Ma, Tromp and Li [5]. However, their agorithms are restricted to fiters that count ony non-overapping hits. This is a significant restriction as the ower correation of overapping q-grams is a big advantage of gapped q-grams over ungapped ones [7]. Brejová, Brown and Vinař [, 2] deveop new variants of the agorithm of Ma, Tromp and Li. In [], a true match is defined not directy by a Hamming distance but as a probabiity distribution represented by a hidden Markov mode. In [2], they further generaise the approach to approximate hits and mutipe shapes. However, the restriction to non-overapping hits remains in a of their work. Very recenty, we became aware of severa papers using mutipe gapped shapes for approximate string matching in various different approaches [3, 7, 8, 4]. We present a new and fexibe method for computing various properties of q-gram fiters. This agorithm is based on a simpe natura abstraction of the probem and appies to a genera cass of fiters. At the same time, it overcomes many restrictions present in previous agorithms in particuar the non-overapping hit restriction. Our method consists of two steps. The first step is an agorithm based on sets of bit-strings. These sets can be of exponentia size and we have to use a compact and efficient representation of the sets to actuay impement our agorithm. A data 57
3 Proceedings of the Prague Stringoogy Conference 4 structure caed BDDs [3, 4] (Binary decision diagrams or Binary decomposition diagrams) impements such a representation of sets. Ony the use of a data structure ike BDDs makes it feasibe to run our agorithm. In the second step the BDDs generated in the first step can be used to efficienty compute interesting properties of the sets they represent. Most properties can be computed in inear time of the size of the BDDs. This cear spit of the probem into two steps distinguishes our method from previous agorithms, which were mosty based on dynamic programming. In the second part of this work, we appy our method to design new and better fiters. The basic idea in this part is to fiter with a set of shapes simutaneousy. Muti-shape fiters have been used before [8]. Our new idea is to use a carefuy seected set of shapes together with a specificay computed fitration criterion. This fitration criterion repaces the optima threshod of a singe shape fiter. The BDD-based agorithm aows us to compute the best fitration criterion for a set of shapes and at the same time determine the important quaity measures of the resuting muti-shape fiter. To investigate the potentia of muti-shape fiters based on a specific fitration criterion, we anayse arge sets of randomy generated fiters. These experiments show that good muti-shape fiters are very rare, but the experiments aso yied fiters that are superior to singe-shape fiters in severa important aspects. 2 Representing Match-Mismatch-Patterns with BDDs Let A and B be two strings of ength. We ca the bit-string p(a, B) {, } the match-mismatch-pattern of the two strings. A in p(a, B) denotes a matching position and a mismatch. The Hamming distance of A and B is then the number of zeros in p(a, B). We represent the number of zeros and ones in a bit-string p by p and p. The k-differences version of approximate string matching aows a pattern string and a match to have a Hamming distance of at most k. For most fiters, the match-mismatch-pattern of an aignment between the pattern string and the database at some position x contains enough information to decide whether x is returned as a potentia match or not. Therefore it is, in principe, sufficient to enumerate a possibe match-mismatch-patterns to anayse the performance of fiters for the k-differences probem. A drawback of this brute-force-approach is, that there are 2 possibe matchmismatch-patterns for strings of ength and reaistic fiters usuay work with pattern ength 5. To overcome this compexity-probem we use a data structure caed BDDs. BDDs aow a compact and efficient representation of sets of equa-ength bit-strings. They can be seen as an abstract data structure that supports the foowing operations. Creation of a new BDD for the base-cases and {ǫ}. Composition of two BDDs S = comp(s, S ) Decomposition of a BDD into two BDDs according to the first position of the bit-strings in the BDD. 58
4 BDD-Based Anaysis of Gapped q-gram Fiters Computing and of two BDDs and the compement of a BDD The composition comp(s, S ) represents the set: comp(s, S ) = {s (s = a a S ) (s = b b S )} For two sets A and B that are given as decompositions A = comp(a, A ) and B = comp(b, B ), A B and A B can be computed recursivey as: and: A B = comp(a B, A B ) A B = comp(a B, A B ) BDDs are impemented with DAGs (Directed Acycic Graphs) and they are simiar to finite automata without oops. In a BDD aways the minima, smaest possibe DAG is used to represent a set of bit-strings and equa sets are represented by one canonica node of the DAG. A coection of BDDs can share the structure of a singe DAG and BDD-impementations usuay make use of hash-tabes to maintain the canonica-representation-property. Hash tabes are aso used to avoid re-computations during the computation of and. ǫ ǫ ǫ A simpe BDD and the set it represents BDDs have many different appications where they often make it possibe to hande exponentia size sets within non-exponentia compexity. An introduction to BDDs can be found in [3, 4], where BDDs are used to represent booean functions over a finite set of variabes. The actua performance of BDDs in an appication depends on the structure of the sets they represent. For sets of size 2 the space compexity of BDDs can range from O() to Θ(2 ). It is important to note, that we use BBDs ony to anayse q-gram fiters. This is a one-time computation and the compexity of the BDDs does not interfere with the compexity of the fiters under consideration. The theoretica compexity of BDDs is therefore secondary in our appication. Our experience is, that BDDs work we to reduce the compexity of fiter anaysis. They make is possibe to anayse a interesting fiters within reasonabe time and space imits. In [] Fontaine described a extension of standard BDDs, which uses {, } as an additiona base case for the decomposition (so caed -BDDs). This extension aows more compact representation than standard BDDs and it was used for a our experiments. For code istings and runtime measurements of a prototype -BDD impementation see []. The prototype impementation ony consists of about kb of C++ code and computing the fiter properties for a typica shape ony takes a few seconds. 59
5 Proceedings of the Prague Stringoogy Conference 4 3 q-gram Simiarity-Based Fiters We use strings from {#,-} to denote different shapes. # stands for a position that must match, whereas - is a don t care or wid-card position. span(s) = s is the span of a shape s. Let A and B be two strings of ength and s a shape. A position i span(s) is a hit of shape s if n < span(s) : s(n) = # A(i + n) = B(i + n). The number of different hits of a shape s for two strings A and B is caed the q-gram simiarity qgs s (A, B) of the two strings. For strings of ength the q-gramsimiarity can be at most span(s) +. A= ACTGTACTGCCGTACT B= ACTGTAATGCAGTACT p(a, B)= shape s= ###---##--## qgs s (A, B)= 2 ACTGTACTGCCGTACT ###---?#--?# ###---##--## <- hit ###---##--## <- hit ###---#?--## ##?---?#--## ACTGTAATGCAGTACT Match-mismatch-pattern and q-gram-simiarity A q-gram fiter computes the set of potentia matches with the hep of a threshod t. A potentia match is the position of a substring in the database with a q-gram simiarity of at east t with the pattern string. Increasing the threshod of a fiter reduces the number of potentia matches at the cost of a decreased fiter sensitivity, i.e. the fiter is more ikey to overook true matches. The match-mismatch-pattern of two strings contains sufficient information to compute their q-gram-simiarity. Therefore a fiter can be anaysed by ooking at a possibe match-mismatch-patterns. We can partition the set of a possibe matchmismatch-patterns according to the q-gram-simiarity they represent for a given shape. For any fixed shape s and any h, N we define: P h = {p {, } s produces exacty h hits in p} It foows that the set P M of match-mismatch-pattern that represent a potentia match is: PM = P h h t A set P h can easiy be computed based on the sets P h if: and P h. A matchmismatch-pattern p {, } is in P h either: its suffix of ength is in P h position or: its suffix of ength is in P h. and it has an additiona hit of shape s at and it does not have an additiona hit at position This agorithm can be formuated as a simpe equation for sets P h = (expand(p h ) S (s)) (expand(p h ) S (s)) 6
6 BDD-Based Anaysis of Gapped q-gram Fiters with the foowing three definitions: S (s) = {p {, } s has a hit in p at position } S (s) = {p {, } s does not have a hit in p at position } expand(m) = {x x = m x = m, m M} BDDs directy support and, and expand(m) can be impemented as expand(m)= comp(m, M). BDDs aso support the creation of S (s) and S (s) for any shape s. S (s) can be computed recursivey as: {, } if s = ǫ if < span(s) S (s) = comp(, S (r)) if s = #r comp(s (r), S (r)) if s = -r Shape=#-## P4 = S 4 = {, } expand(p4 ) = {,,, } P5 = {,,,,, } P5 2 = {} P5 = {,,,...} P h and S (s) for shape #-## S {, } span(s) As an aternative to our definition of the q-gram-simiarity qgs(a, B) it is possibe to require individua hits to be non-overapping [5, ]. For such fiters the set PM can be computed with an agorithm simiar to the one described above. (Compute the sets P (h,i), where i is the offset of the first hit.) 4 Fiter Anaysis with BDDs The agorithm described in the previous section aows us to generate BDDrepresentations for the sets P h. These BDD-representations can be used to compute many interesting properties of the sets and thereby the underying fiters. Note that the computation of the various properties is independent of what fiter the sets P h represent and how they were computed. This is in contrast to previous approaches using dynamic programming where the fiter definition is deepy invoved in the property computation. 4. Specificity The specificity of a fiter describes its abiity to reduce a arge database to a sma set of potentia matches. For a given random mode, the fiter specificity is equivaent to the probabiity that a random substring of ength is a potentia match of a random search pattern. 6
7 Proceedings of the Prague Stringoogy Conference 4 Every match-mismatch-pattern p describes one possibe event that can occur whie aigning a database and a search pattern and we can use severa probabiity modes to assign probabiities to these events. We can then simpy extend these probabiities from one match-mismatch-pattern to sets of match-mismatch-patterns by summing up the probabiities of the eements of the sets. For exampe, to anayse a fiter for a DNA database, we might assume that the database and pattern string are independent random strings with an even distribution of the etters {A, C, G, T }. It foows that every singe character has a chance of 4 being a match and the probabiity of any match-mismatch-pattern p is: prob(p) = ( 4 ) p ( 3 4 ) p With this we can compute the probabiity of a potentia match, i.e the specificity of the fiters as: specificity = prob(p) p PM Given the binary decomposition comp(p, P ) of a set P the probabiity Prob(P) of the set is: Prob(P) = ( 3 4 ) Prob(P ) + ( 4 ) Prob(P ) The base-cases for the binary decomposition are aso the base-cases for this recursion: Prob( ) = Prob(ǫ) = This shows that, if BDDs are used to represent the sets, Prob(P) = p P prob(p) can be computed in inear time of the size of the BDDs. It can be seen that a simiar approach aows to compute the probabiities of sets for many different probabiity modes efficienty. In particuar it is aso possibe to use hidden Markov modes (HMMs) as probabiity mode. HMMs have been used in [] to mode rea DNA sequences of different species. 4.2 Recognition Rate For approximate string matching with Hamming distance we can define the recognition rate r(j) of a fiter as the expected fraction of potentia matches among substrings of the database with exacty Hamming distance j. The match-mismatch-patterns of ength and Hamming distance j can easiy be computed with the singe-character shape # as P j (#). It foows that a fiter with potentia matches P M has the recognition rate: r(j) = Prob(PM P j (#)) Prob(P j (#)) Recognition rates have been defined and determined experimentay in [5]. 4.3 Threshod The set of potentia matches of a fiter with shape s, and with it the recognition rates of the fiter, heaviy depends on the threshod t. 62
8 BDD-Based Anaysis of Gapped q-gram Fiters A fiter is ossess for a threshod t and Hamming distance k if j k : r(j) =, otherwise it is ossy. If one is interested in a fixed maxima Hamming distance k and ossess fitering, then there exists an optima threshod t best. A dynamic programming agorithm for computing t best is described in [7]. BDD-based threshod computation is aso possibe. For each set P h we compute: m(p) = min p P p We use the notation p for the number of occurrences of in string p. m(p h ) is the minimum number of mismatching positions of any match-mismatch-pattern p P h This minimum can be found in inear time in the size of the BDD. Any set P h with m(p h ) k contains at east one match-mismatch-pattern with Hamming distance at most k. The optima threshod t best for a ossess fiter is the smaest h such that m(p h ) k. shape s = #-#---#-#-#------# span(s) = 8 pattern ength = 5 number of hits h {,..., 33} h m(p h) k = 7 t best = k = 6 t best = 3 k = 5 t best = 6 k = 4 t best = 9 Computing the threshod t best. 5 Muti-shape Fiters Shapes can be better than contiguous q-grams because they introduce irreguarity in the way the mismatching positions affect the q-grams. For good shapes, ony a few worst case configurations of the mismatching characters affect many q-grams. A reasonabe approach to further improve the performance of fiters is therefore to use two or more somehow orthogona shapes in parae. The idea is, that those configurations of mismatches, that are particuary bad for one shape, are better covered by a second shape and vice versa. Designing a good muti-shape fiter is a nontrivia combinatoria probem, just ike finding good individua shapes. One coud assume that the best individua shapes aso form the best muti-shape fiter, however our experiments suggest that this is often not the case. Muti-shape fiters are the most important appication for our BDD-based approach. The extension of our agorithm to muti-shape fiters is straight-forward and it eads to a new concept: the generic fitration criterion C. The generic fitration criterion C repaces the threshod t of a singe-shape fiter. It enabes a muti-shape fiter to make fu use of the reations between the singe shapes. 63
9 Proceedings of the Prague Stringoogy Conference 4 A fiter with n shapes s...s n can use the q-gram simiarities h = qgs s (M, P)... h n = qgs sn (M, P) to decide whether M is a potentia match or not. (P is the pattern string and M is any substring of the database.) We ca a set C N n a fitration criterion for the shapes s...s n and define: M is a potentia match (h,...,h n ) C This generic fitration criterion C can mode many different strategies for mutishape fiters. For exampe it can mode fiters that require at east one hit of one shape, fiters the require one hit of each shape, fiters that sum up the hits of the shapes, or fiters that use each shape with its individua threshod t best. In Section 3 we used the notation P h (s) for the set of a match-mismatch-patterns with exacty h hits of a singe fixed shape s. To anayse muti-shape fiters we extend this notation to sets of shapes {s,...,s n }. We define P (h,...,h n) (s,...,s n ) as the set of a match match-mismatch-patterns with exacty h i hits of shape s i ( i n). The sets P (h,...,h n) (s,...,s n ) can be computed as: P (h,...,h n) (s,..., s n ) = P h i (s i ) i n With this, the set of match-mismatch-patterns, that represent a potentia match according to a fitration criterion C is: PM = (h,...,h n) C P (h,...,h n) (s,...,s n ) Together with the set P M, a statistica performance measures (recognition rate, specificity), which we computed for singe-shape fiters in Section 3, are now aso avaiabe for our mode of muti-shape fiters. The definition of P (h,...,h n) (s,...,s n ) aso makes it possibe to compute a optima fitration criterion C best for a ossess fiter with some fixed Hamming distance k. It is: C best = {(h... h n ) m(p (h,...,h n) (s,...,s n )) k} C best repaces the threshod t best of singe shape fiters. To reduce the high compexity invoved in the computation of C best Fontaine [] describes a straight forward approximation. 6 Designing Better Fiters The design of a fiter is aways a compromise between three objectives: high sensitivity fast fitration phase high specificity of the fiter, i.e. a fast verification phase 64
10 BDD-Based Anaysis of Gapped q-gram Fiters There are severa trade-offs between these objectives. For exampe, a higher sensitivity is usuay at the cost of a ower specificity and a faster fitration often yieds ower sensitivities and specificities [5, 5, 8]. Using a we chosen shape for the q-grams and the appropriate threshod can greaty improve overa fiter performance compared to fitering with ungapped q- grams [6, 7]. In this section we wi show that muti-shape fiters with a carefuy seected set of shapes and a specificay computed fitration criterion can further boost fiter performance for a three objectives compared to singe-shape fiters. A good estimate for the runtime of a q-gram fiter is the number of hits in the database that have to be processed. It is roughy proportiona to Σ q. (This assumes a database with a random distribution of etters from Σ and it is aso a good estimate for exampe for DNA sequences [7].) High vaues of q are desirabe because they make the fitration fast however they aso mean ower sensitivities. In this section we ony consider q-gram fiters that work ossess for a fixed Hamming distance k and we use k to compare the sensitivities of such fiters (a higher vaue of k means a higher sensitivity). To compare the specificities of different fiters, we aways use the shapes with the optima threshod t best (the optima fitration criterion C best for muti-shape fiters) that sti guarantees ossess fitering for the fixed k. For a experiments in this section, we use a pattern ength = 5 and assume a DNA-ike database with Σ = 4. There is a trade-off between k and the highest vaue of q that can be used for ossess fitering. For exampe for pattern ength = 5 and k = 5 the highest possibe q for a ossess singe-shape fiter is q =, for k = 6 it is q = 9. Simiar constraints between q and k aso exist for muti-shape fiters. However we found that they can have higher vaues of both q and k than is possibe for singe shapes. Therefore muti-shape fiters make it possibe to increase q, which makes them faster, or increase k, i.e the sensitivity. In some cases it is even possibe to increase q and k at the same time. This is not at the cost of a ower specificity, but instead it is even possibe to increase the specificity aso. Pairs of shapes: q =, k = 6 Consider for exampe the foowing three ossess two-shape fiters for k = 6: Three good two-shapes fiters k = 6, = 5, Σ = 4 s s 2 specificity a) ##-##---##-#### ###-#-###----#--## b) #-##-###-#### ####----###--##-# c) #-##-##--##### ####--#--#---##--## C best a) and c) {(, ), (, ), (, ), (, ), (2, )} not (, 2)! b) {(, ), (, ), (, 2), (, ), (, ), (2, )} The best singe shape fiter for this probem is ######-#-## with q = 9, t best = 2 and a specificity of Compared to this singe shape fiter, each of these three muti-shape fiters with two (q = )-shapes improves the specificity by a factor of 5. The runtime of a muti-shape fiter is approximatey the sum of the 65
11 Proceedings of the Prague Stringoogy Conference 4 run-times computed for its individua shapes. This means that the fiters with two (q = )-shapes for Σ = 4 are aso about two times as fast as a q = 9 singe-shape fiter. The three muti-shape fiters of this exampe were found by scanning 5 pairs of random (q = )-shapes. In this sampe set, good pairs were extremey rare of the pairs, i.e. more than two-thirds, did not yied a ossess fiter for k = 6 at a. It is interesting that none of the six shapes that comprise the three best pairs, we found, work particuary we as a singe-shape fiter. This suggests that combining good singe-shape fiters is not necessariy the best method to construct a good mutishape fiter. Aso note, that 5 pairs of shapes is a reativey sma random set. It is very ikey that much better two-shape fiters can be found with more extensive experiments. 4-tupes of shapes: q =, k = 8 In a second experiment we fixed q to and tried to increase k. We generated, 5, fiters with 4-tupes of random (q = )-shapes and anaysed each of these 4-tupes with our agorithm. In this sampe set, the good 4-tupe fiters were again rare. Nevertheess, we found 5 ossess fiters for k = 8 with specificities of about 3. For comparison, the highest possibe k for a ossess singe-shape fiter with q = 9 is k = 6. (A singe-shape fiter with q = 9, is about as fast as our 4-tupe fiters.) The fitration criterion of the 5 4-tupes fiters for k = 8 is that they require at east one hit of any of the four shapes. 4-tupes of shapes: q =, k = 7 Aternativey, each of the 5 good 4-tupe fiters, we found, can aso be used for k = 7 with a stricter fitration criterion. Athough the computation of the exact fitration criterion C best for this probem has a high compexity, it is easy to compute an suitabe approximation C approx []. The compement C approx of one such approximation consists of 4 eements. This fitration criterion C approx guarantees ossess fitering for k = 7 and a specificity of The best set of four shapes out of,5, random = 5, k = 7, specificity = e 8 s =##-#-###---#--#----## s 2 =##-#--#--#-#-#--### s 3 =###-#-##-#-### s 4 =##-###-----##--#-#--# C for the best 4-tupe fiter and k = 7 {(,,, ), (,,, ), (,,, 2), (,,, 3),(,,, 4), (,,, 5), (,,,), (,,, ), (,,, 2), (,, 2, ), (,, 2, ),(,,3, ),(,, 3, ), (,, 4,), (,, 4, ), (,, 5, ), (,,, ), (,,, ),(,,, 2),(,,, ), (,,,), (,, 2, ), (,, 3, ), (,, 4, ), (, 2,, ),(, 2,, ),(, 2,, ), (,3,,), (,,, ), (,,, ), (,,, 2), (,,, 4),(,,, ),(,, 2, ), (,, 4,), (,,, ), (,,, ), (, 2,, ), (2,,,), (5,,,)} The experiments show that muti-shape fiters can have significanty better specificities and work for higher vaues of k than singe-shape fiters. At the same time they can aso speed up the fitration. It remains an open question if there is an agorithm to construct good sets of shapes for muti-shape fiters. 66
12 BDD-Based Anaysis of Gapped q-gram Fiters 7 Concusion We described a new method for the anaysis of gapped q-gram fiters. This method uses bit-strings, we ca them match-mismatch-patterns, to describe possibe aignments between the database and the search patterns. Sets of match-mismatchpatterns provide a simpe abstraction of fiter agorithms for the k-differences probem. The first step of our approach is to generate sets of match-mismatch-patterns, in particuar the set of match-mismatch-patterns representing the potentia matches. To impement this step efficienty, we use BDDs as a data structure to represent sets of bit-strings. In the second step, we can then use these BDD representations to compute many interesting properties of fiters ike the recognition rate and specificity for various probabiity modes. Our approach is simpe and genera and appies to a variety of fiter agorithms. For exampe, it can mode singe-shape fiters with any threshod and generic mutishape fiters. Previous agorithms for fiter-reated probems were often based on dynamic programming. Compared to dynamic programming, our approach is more genera and more natura and aows many interesting extensions. The most important appication of our approach is the anaysis of muti-shape fiters, which work with a set of shapes in parae. For any set of shapes, our approach can compute an optima fitration criterion C best, which guarantees ossess fitering for the k-differences probem and aso the sensitivities and specificities of the resuting muti-shape fiter. We found, that good muti-shape fiters with a carefuy seected set of shapes and a specificay computed fitration criterion C best are much better than singe-shape fiters. They aow higher specificities and sensitivities than singe shape fiters and higher vaues of k are possibe (for ossess fitering). Muti-shape fiters can aso be faster than singe shape fiters, because they sti work with higher vaues of q. The BDD-based approach makes it possibe to find good muti-shape fiters by scanning a arge number of randomy generated candidates. However, ony a sma fraction of these candidates show the desired properties. Since fu enumeration as for singe-shape fiters [6] is not possibe for muti-shape fiters, a constructive agorithm to generate good sets of shapes remains an interesting open probem. References [] B. Brejová, D. G. Brown, and T. Vinař. Optima spaced seeds for hidden Markow modes, with appications to homoogous coding regions. In Proc. 4th Annua Symposium on Combinatoria Pattern Matching, voume 2676 of LNCS, pages Springer, 23. [2] B. Brejová, D. G. Brown, and T. Vinař. Vector seeds: an extension to spaced seeds aows substantia improvements in sensitivity and specificity. In Proc. 3rd Internationa Workshop on Agorithms and Bioinformatics, voume 282 of Lecture Notes in Bioinformatics, pages Springer, 23. [3] R. E. Bryant. Graph-based agorithms for booean function manipuation. IEEE Transactions on Computers, 35:677 69, 986(8). 67
13 Proceedings of the Prague Stringoogy Conference 4 [4] R. E. Bryant. Symboic booean manipuation with ordered binary-decision diagrams. ACM Computing Surveys, 24:293 38, 992(3). [5] S. Burkhardt. Fiter Agorithms for Approximate String Matching. PhD thesis, Department of Computer Science, Saarand University, stburk/thesis.ps. [6] S. Burkhardt and J. Kärkkäinen. Better fitering with gapped q-grams. In Proc. 2th Annua Symposium on Combinatoria Pattern Matching, voume 289 of LNCS, pages Springer, 2. [7] S. Burkhardt and J. Kärkkäinen. Better fitering with gapped q-grams. Fundamenta Informaticae, 56( 2):5 7, 23. [8] A. Caifano and I. Rigoutsos. FLASH: A fast ook-up agorithm for string homoogy. In Proc. st Internationa Conference on Inteigent Systems for Moecuar Bioogy, pages AAAI Press, 993. [9] K. P. Choi, F. Zeng, and L. Zhang. Good spaced seeds for homoogy search. Bioinformatics, 2(7):54 59, 24. [] K. P. Choi and L. Zhang. Sensitivity anaysis and efficient method for identifying optima spaced seeds. Journa of Computer and System Sciences, 68:22 4, 24. [] M. Fontaine. Computing the fitration efficiency of shape-index-fiters for approximate string matching. Master s thesis, Dept. of Computer Science, Saarand University, Nov fontaine/thesis.ps. [2] U. Keich, M. Li, B. Ma, and J. Tromp. On spaced seeds for simiarity search. Discrete Appied Mathematics, 38(3): , 24. [3] G. Kucherov, L. Noé, and M. Roytberg. Muti-seed ossess fitration. To appear in CPM 24. [4] M. Li, B. Ma, D. Kisman, and J. Tromp. PatternHunter II: Highy Sensitive and Fast Homoogy Search. Journa of Bioinformatics and Computationa Bioogy, 24. To appear. Eary version in GIW 23. [5] B. Ma, J. Tromp, and M. Li. Patternhunter: faster and more sensitive homoogy search. Bioinformatics, 8:44 445, 22. [6] L. Noè and G. Kucherov. YASS: Simiarity search in DNA sequences. Technica report, INRIA Tech report 4852, 23. [7] Y. Sun and J. Buher. Designing mutipe simutaneous seeds for DNA simiarity search. In Proceedings of the eighth annua internationa conference on Computationa moecuar bioogy, pages 76 84, 24. [8] J. Xu, D. Brown, M. Li, and B. Ma. Optimizing mutipe spaced seeds for homoogy search. To appear in CPM
A Brief Introduction to Markov Chains and Hidden Markov Models
A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,
More informationCryptanalysis of PKP: A New Approach
Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in
More informationBayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?
Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine
More informationEfficiently Generating Random Bits from Finite State Markov Chains
1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationA. Distribution of the test statistic
A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch
More informationMARKOV CHAINS AND MARKOV DECISION THEORY. Contents
MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After
More informationCombining reaction kinetics to the multi-phase Gibbs energy calculation
7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation
More informationPartial permutation decoding for MacDonald codes
Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics
More informationII. PROBLEM. A. Description. For the space of audio signals
CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time
More informationA Solution to the 4-bit Parity Problem with a Single Quaternary Neuron
Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria
More information8 Digifl'.11 Cth:uits and devices
8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,
More informationTHE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS
ECCM6-6 TH EUROPEAN CONFERENCE ON COMPOSITE MATERIALS, Sevie, Spain, -6 June 04 THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS M. Wysocki a,b*, M. Szpieg a, P. Heström a and F. Ohsson c a Swerea SICOMP
More informationTarget Location Estimation in Wireless Sensor Networks Using Binary Data
Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344
More informationEfficient Generation of Random Bits from Finite State Markov Chains
Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationOptimality of Inference in Hierarchical Coding for Distributed Object-Based Representations
Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,
More informationSTA 216 Project: Spline Approach to Discrete Survival Analysis
: Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing
More informationNEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION
NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen
More informationAn Algorithm for Pruning Redundant Modules in Min-Max Modular Network
An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai
More informationExplicit overall risk minimization transductive bound
1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,
More informationCount-Min Sketches for Estimating Password Frequency within Hamming Distance Two
Count-Min Sketches for Estimating Password Frequency within Hamming Distance Two Leah South and Dougas Stebia Schoo of Mathematica Sciences, Queensand University of Technoogy, Brisbane, Queensand, Austraia
More informationApproximated MLC shape matrix decomposition with interleaf collision constraint
Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence
More informationXSAT of linear CNF formulas
XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open
More informationCollective organization in an adaptative mixture of experts
Coective organization in an adaptative mixture of experts Vincent Vigneron, Christine Fuchen, Jean-Marc Martinez To cite this version: Vincent Vigneron, Christine Fuchen, Jean-Marc Martinez. Coective organization
More informationStochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract
Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer
More informationAsymptotic Properties of a Generalized Cross Entropy Optimization Algorithm
1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete
More informationStatistical Learning Theory: A Primer
Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO
More informationInductive Bias: How to generalize on novel data. CS Inductive Bias 1
Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view
More informationSeparation of Variables and a Spherical Shell with Surface Charge
Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation
More informationStochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997
Stochastic utomata etworks (S) - Modeing and Evauation Pauo Fernandes rigitte Pateau 2 May 29, 997 Institut ationa Poytechnique de Grenobe { IPG Ecoe ationae Superieure d'informatique et de Mathematiques
More information#A48 INTEGERS 12 (2012) ON A COMBINATORIAL CONJECTURE OF TU AND DENG
#A48 INTEGERS 12 (2012) ON A COMBINATORIAL CONJECTURE OF TU AND DENG Guixin Deng Schoo of Mathematica Sciences, Guangxi Teachers Education University, Nanning, P.R.China dengguixin@ive.com Pingzhi Yuan
More informationUniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete
Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity
More informationarxiv: v1 [math.ca] 6 Mar 2017
Indefinite Integras of Spherica Besse Functions MIT-CTP/487 arxiv:703.0648v [math.ca] 6 Mar 07 Joyon K. Boomfied,, Stephen H. P. Face,, and Zander Moss, Center for Theoretica Physics, Laboratory for Nucear
More informationApproximated MLC shape matrix decomposition with interleaf collision constraint
Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,
More informationTraffic data collection
Chapter 32 Traffic data coection 32.1 Overview Unike many other discipines of the engineering, the situations that are interesting to a traffic engineer cannot be reproduced in a aboratory. Even if road
More informationA Novel Learning Method for Elman Neural Network Using Local Search
Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama
More informationTwo view learning: SVM-2K, Theory and Practice
Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk
More informationImproving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees
Improving the Accuracy of Booean Tomography by Expoiting Path Congestion Degrees Zhiyong Zhang, Gaoei Fei, Fucai Yu, Guangmin Hu Schoo of Communication and Information Engineering, University of Eectronic
More informationParagraph Topic Classification
Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA
More informationOn the Goal Value of a Boolean Function
On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor
More informationhttps://doi.org/ /epjconf/
HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department
More informationMelodic contour estimation with B-spline models using a MDL criterion
Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex
More informationExpectation-Maximization for Estimating Parameters for a Mixture of Poissons
Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating
More informationAST 418/518 Instrumentation and Statistics
AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the
More informationDetermining The Degree of Generalization Using An Incremental Learning Algorithm
Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c
More informationChemical Kinetics Part 2
Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate
More informationPairwise RNA Edit Distance
Pairwise RNA Edit Distance In the foowing: Sequences S 1 and S 2 associated structures P 1 and P 2 scoring of aignment: different edit operations arc atering arc removing 1) ACGUUGACUGACAACAC..(((...)))...
More informationMATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES
MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is
More informationNIKOS FRANTZIKINAKIS. N n N where (Φ N) N N is any Følner sequence
SOME OPE PROBLEMS O MULTIPLE ERGODIC AVERAGES IKOS FRATZIKIAKIS. Probems reated to poynomia sequences In this section we give a ist of probems reated to the study of mutipe ergodic averages invoving iterates
More informationGeneral Certificate of Education Advanced Level Examination June 2010
Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/Q10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of using one or two eyes on the perception
More informationA Statistical Framework for Real-time Event Detection in Power Systems
1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem
More informationHaar Decomposition and Reconstruction Algorithms
Jim Lambers MAT 773 Fa Semester 018-19 Lecture 15 and 16 Notes These notes correspond to Sections 4.3 and 4.4 in the text. Haar Decomposition and Reconstruction Agorithms Decomposition Suppose we approximate
More informationNew Efficiency Results for Makespan Cost Sharing
New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context
More informationSequential Decoding of Polar Codes with Arbitrary Binary Kernel
Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient
More informationFrom Margins to Probabilities in Multiclass Learning Problems
From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error
More informationarxiv: v1 [cs.lg] 31 Oct 2017
ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT
More informationarxiv: v1 [cs.ds] 12 Nov 2018
Quantum-inspired ow-rank stochastic regression with ogarithmic dependence on the dimension András Giyén 1, Seth Loyd Ewin Tang 3 November 13, 018 arxiv:181104909v1 [csds] 1 Nov 018 Abstract We construct
More informationDIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM
DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing
More informationGeneral Certificate of Education Advanced Level Examination June 2010
Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/P10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of temperature on the rate of photosynthesis
More informationHigh Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method
High Spectra Resoution Infrared Radiance Modeing Using Optima Spectra Samping (OSS) Method J.-L. Moncet and G. Uymin Background Optima Spectra Samping (OSS) method is a fast and accurate monochromatic
More informationC. Fourier Sine Series Overview
12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a
More informationTwo Birds With One Stone: An Efficient Hierarchical Framework for Top-k and Threshold-based String Similarity Search
Two Birds With One Stone: An Efficient Hierarchica Framework for Top-k and Threshod-based String Simiarity Search Jin Wang Guoiang Li Dong Deng Yong Zhang Jianhua Feng Department of Computer Science and
More informationAsynchronous Control for Coupled Markov Decision Systems
INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of
More informationFRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)
1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using
More information<C 2 2. λ 2 l. λ 1 l 1 < C 1
Teecommunication Network Contro and Management (EE E694) Prof. A. A. Lazar Notes for the ecture of 7/Feb/95 by Huayan Wang (this document was ast LaT E X-ed on May 9,995) Queueing Primer for Muticass Optima
More informationThe EM Algorithm applied to determining new limit points of Mahler measures
Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,
More informationMaintenance activities planning and grouping for complex structure systems
Maintenance activities panning and grouping for compex structure systems Hai Canh u, Phuc Do an, Anne Barros, Christophe Bérenguer To cite this version: Hai Canh u, Phuc Do an, Anne Barros, Christophe
More informationConsistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems
Consistent inguistic fuzzy preference reation with muti-granuar uncertain inguistic information for soving decision making probems Siti mnah Binti Mohd Ridzuan, and Daud Mohamad Citation: IP Conference
More informationFeasible Itemset Distributions in Data Mining: Theory and Application
Feasibe Itemset Distributions in Data Mining: Theory and Appication Ganesh Ramesh Dept. of Computer Science University at Abany, SUNY Abany, NY 12222, USA ganesh@cs.abany.edu Wiiam A. Maniatty Dept. of
More informationThe influence of temperature of photovoltaic modules on performance of solar power plant
IOSR Journa of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vo. 05, Issue 04 (Apri. 2015), V1 PP 09-15 www.iosrjen.org The infuence of temperature of photovotaic modues on performance
More informationFormulas for Angular-Momentum Barrier Factors Version II
BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A
More informationScalable Spectrum Allocation for Large Networks Based on Sparse Optimization
Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica
More informationMULTI-PERIOD MODEL FOR PART FAMILY/MACHINE CELL FORMATION. Objectives included in the multi-period formulation
ationa Institute of Technoogy aicut Department of echanica Engineering ULTI-PERIOD ODEL FOR PART FAILY/AHIE ELL FORATIO Given a set of parts, processing requirements, and avaiabe resources The objective
More informationChemical Kinetics Part 2. Chapter 16
Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates
More informationPower Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks
ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro
More informationIntegrating Factor Methods as Exponential Integrators
Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been
More informationAdjustment of automatic control systems of production facilities at coal processing plants using multivariant physico- mathematical models
IO Conference Series: Earth and Environmenta Science AER OEN ACCESS Adjustment of automatic contro systems of production faciities at coa processing pants using mutivariant physico- mathematica modes To
More informationConstruction of Supersaturated Design with Large Number of Factors by the Complementary Design Method
Acta Mathematicae Appicatae Sinica, Engish Series Vo. 29, No. 2 (2013) 253 262 DOI: 10.1007/s10255-013-0214-6 http://www.appmath.com.cn & www.springerlink.com Acta Mathema cae Appicatae Sinica, Engish
More informationMODELING OF A THREE-PHASE APPLICATION OF A MAGNETIC AMPLIFIER
TH INTERNATIONAL CONGRESS OF THE AERONAUTICAL SCIENCES MODELING OF A THREE-PHASE APPLICATION OF A MAGNETIC AMPLIFIER L. Austrin*, G. Engdah** * Saab AB, Aerosystems, S-81 88 Linköping, Sweden, ** Roya
More informationOn a geometrical approach in contact mechanics
Institut für Mechanik On a geometrica approach in contact mechanics Aexander Konyukhov, Kar Schweizerhof Universität Karsruhe, Institut für Mechanik Institut für Mechanik Kaiserstr. 12, Geb. 20.30 76128
More informationAutomobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn
Automobie Prices in Market Equiibrium Berry, Pakes and Levinsohn Empirica Anaysis of demand and suppy in a differentiated products market: equiibrium in the U.S. automobie market. Oigopoistic Differentiated
More informationTesting for the Existence of Clusters
Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers
More informationOptimal spaced seeds for faster approximate string matching
Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching
More informationFast Blind Recognition of Channel Codes
Fast Bind Recognition of Channe Codes Reza Moosavi and Erik G. Larsson Linköping University Post Print N.B.: When citing this work, cite the origina artice. 213 IEEE. Persona use of this materia is permitted.
More informationSome Measures for Asymmetry of Distributions
Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester
More informationOn colorings of the Boolean lattice avoiding a rainbow copy of a poset arxiv: v1 [math.co] 21 Dec 2018
On coorings of the Booean attice avoiding a rainbow copy of a poset arxiv:1812.09058v1 [math.co] 21 Dec 2018 Baázs Patkós Afréd Rényi Institute of Mathematics, Hungarian Academy of Scinces H-1053, Budapest,
More informationarxiv: v1 [cs.db] 1 Aug 2012
Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering
More informationStatistical Learning Theory: a Primer
??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa
More informationRecursive Constructions of Parallel FIFO and LIFO Queues with Switched Delay Lines
Recursive Constructions of Parae FIFO and LIFO Queues with Switched Deay Lines Po-Kai Huang, Cheng-Shang Chang, Feow, IEEE, Jay Cheng, Member, IEEE, and Duan-Shin Lee, Senior Member, IEEE Abstract One
More informationOptimal spaced seeds for faster approximate string matching
Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching
More informationMassJoin: A MapReduce-based Method for Scalable String Similarity Joins
MassJoin: A MapReduce-based Method for Scaabe String Simiarity Joins Dong Deng, Guoiang Li, Shuang Hao, Jiannan Wang, Jianhua Feng Department of Computer Science, Tsinghua University, Beijing, China {dd11,
More informationPaper presented at the Workshop on Space Charge Physics in High Intensity Hadron Rings, sponsored by Brookhaven National Laboratory, May 4-7,1998
Paper presented at the Workshop on Space Charge Physics in High ntensity Hadron Rings, sponsored by Brookhaven Nationa Laboratory, May 4-7,998 Noninear Sef Consistent High Resoution Beam Hao Agorithm in
More informationAlgorithms to solve massively under-defined systems of multivariate quadratic equations
Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations
More informationSardinas-Patterson like algorithms in coding theory
A.I.Cuza University of Iaşi Facuty of Computer Science Sardinas-Patterson ike agorithms in coding theory Author, Bogdan Paşaniuc Supervisor, Prof. dr. Ferucio Laurenţiu Ţipea Iaşi, September 2003 Contents
More informationRate-Distortion Theory of Finite Point Processes
Rate-Distortion Theory of Finite Point Processes Günther Koiander, Dominic Schuhmacher, and Franz Hawatsch, Feow, IEEE Abstract We study the compression of data in the case where the usefu information
More informationUniversal Consistency of Multi-Class Support Vector Classification
Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the
More informationAppendix A: MATLAB commands for neural networks
Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');
More informationSchool of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY
The ogic of Booean matrices C. R. Edwards Schoo of Eectrica Engineering, Universit of Bath, Caverton Down, Bath BA2 7AY A Booean matrix agebra is described which enabes man ogica functions to be manipuated
More informationSydU STAT3014 (2015) Second semester Dr. J. Chan 18
STAT3014/3914 Appied Stat.-Samping C-Stratified rand. sampe Stratified Random Samping.1 Introduction Description The popuation of size N is divided into mutuay excusive and exhaustive subpopuations caed
More information