arxiv: v1 [cs.db] 30 Jun 2012

Size: px
Start display at page:

Download "arxiv: v1 [cs.db] 30 Jun 2012"

Transcription

1 Mining Statistially Signifiant Substrings using the Chi-Square Statisti Mayank Sahan Det. of Comuter Siene and Engineering Indian Institute of Tehnology, Kanur INDIA Arnab Bhattaharya arxiv:7.0144v1 [s.db] 30 Jun 01 ABSTRACT The roblem of identifiation of statistially signifiant atterns in a sequene of data has been alied to many domains suh as intrusion detetion systems, finanial models, web-lik reords, automated monitoring systems, omutational biology, rytology, and text analysis. An observed attern of events is deemed to be statistially signifiant if it is unlikely to have ourred due to randomness or hane alone. We use the hi-square statisti as a quantitative measure of statistial signifiane. Given a string of haraters generated from a memoryless Bernoulli model, the roblem is to identify the substring for whih the emirial distribution of single letters deviates the most from the distribution exeted from the generative Bernoulli model. This deviation is atured using the hi-square measure. The most signifiant substring (MSS) of a string is thus defined as the substring having the highest hi-square value. Till date, to the best of our knowledge, there does not exist any algorithm to find the MSS in better than O(n ) time, where n denotes the length of the string. In this aer, we roose an algorithm to find the most signifiant substring, whose running time is O(n 3/ ) with high robability. We also study some variants of this roblem suh as finding the to-t set, finding all substrings having hi-square greater than a fixed threshold and finding the MSS among substrings greater than a given length. We exerimentally demonstrate the asymtoti behavior of the MSS on varying the string size and alhabet size. We also desribe some aliations of our algorithm on rytology and real world data from finane and sorts. Finally, we omare our tehnique with the existing heuristis for finding the MSS. 1. MOTIVATION Statistial signifiane is used to asertain whether the outome of a given exeriment an be asribed to some extraneous fators or is solely due to hane. Given a string omosed of haraters from an alhabet Σ {a 1, a,..., a k } of onstant size k, the null hyothesis assumes that the letters of the string are generated from a memoryless Bernoulli model. Eah letter of the string is drawn randomly and indeendently from a fixed multinomial robability Permission to make digital or hard oies of all or art of this work for ersonal or lassroom use is granted without fee rovided that oies are not made or distributed for rofit or ommerial advantage and that oies bear this notie and the full itation on the first age. To oy otherwise, to reublish, to ost on servers or to redistribute to lists, requires rior seifi ermission and/or a fee. Artiles from this volume were invited to resent their results at The 38th International Conferene on Very Large Data Bases, August 7th - 31st 01, Istanbul, Turkey. Proeedings of the VLDB Endowment, Vol. 5, No. Coyright 01 VLDB Endowment /1/06... $.00. distribution P { 1,,..., k } where i denotes the robability of ourrene of harater a i in the alhabet ( P i 1). The objetive is to find the onneted subregion of the string (i.e., a substring) for whih the emirial distribution of single letters deviates the most from the distribution given by the Bernoulli model. Detetion of statistially relevant atterns in a sequene of events has drawn signifiant interest in the omuter siene ommunity and has been diversely alied in many fields inluding moleular biology, rytology, teleommuniations, intrusion detetion, automated monitoring, text mining, and finanial modeling. The aliations in omutational biology inlude assessing the over reresentation of exetional atterns [7] and studying the mutation harateristis in the rotein sequene of an organism by identifying the sudden hanges in their mutation rates [18]. Different studies suggest deteting intrusions in various information systems by searhing for hidden atterns that are unlikely to our [6, 7]. In teleommuniation, it has been alied to detet eriods of heavy traffi [13]. It has also been used in analyzing finanial time series to reveal hidden temoral atterns that are harateristi and reditive of time series events [] and to redit stok ries [17]. Quantifying a substring as statistially signifiant deends on the statistial model used to alulate the deviation of the emirial distribution of single letters from its exeted nature. The exat formulation of statistial signifiane deends on the metri used; -value and z-sore [3, 5] reresent the two most ommonly used ones (some of the other ones are reviewed in [, 4]). Researh indiates that in most ratial ases, -value rovides more reise and aurate results as omared to z-sore [7]. The -value is defined as the robability of obtaining a test statisti at least as extreme as the one that was atually observed assuming the null hyothesis to be true. For examle, in an exeriment to determine whether a oin is fair, suose it turns u head on 19 out of 0 tosses. Assuming the null hyothesis, i.e., the oin is fair, to be true, the -value is equal to the robability of observing 19 or more heads in 0 flis of a fair oin: 1 -value P r(19h) + P r(0h) ` ` % Traditionally, the deision to rejet or fail to rejet the null hyothesis is based on a re-defined signifiane level α. If the -value is low, the result is less likely assuming the null hyothesis to be true. Consequently, the observation is statistially more signifiant. 1 This definition of -value is art of a one-sided test; however, we an also alulate the robability of getting at least 19 heads or at least 19 tails whih is art of a two-sided test. The -value is just double in this ase due to symmetry. 5

2 In a memoryless Bernoulli multinomial model, the robability of observing a onfiguration β 0, given by a ount vetor C {Y 1, Y,..., Y k } with P k Yi l (where l is the length of the substring) denoting the set of observed frequenies of eah harater in the alhabet, is defined as P r(c β 0) l! ky Y i i Y i! The -value for this model then is X -value P r(β) () β more extreme than β 0 However, omuting the -value exatly requires analyzing all ossible outomes of the exeriment whih are otentially exonential in number, thereby rendering the omutation imratial. Moreover, it has been shown that for large samles, asymtoti aroximations are aurate enough and easier to alulate [4]. The two broadly used aroximations are the likelihood ratio statisti and the Pearson s hi-square statisti [4]. In ase of likelihood ratio test, an alternative hyothesis is set u under whih eah i is relaed by its maximum likelihood estimate π i x i/n with the exat robability of a onfiguration under null hyothesis defined similarly as in the revious ase. The natural logarithm of the ratio between these two robabilities multilied by is then the statisti for the likelihood ratio test: «πi ln(lr) x i ln (3) Alternatively, the Pearson s hi-square statisti, denoted by X, measures the deviation of observed frequeny distribution from the theoretial distribution [5]: X (O i E i) E i i (1) (Y i l i) (4) l i where O i and E i are theoretial and observed frequenies of the haraters in the substring. Sine eah letter of the substring is drawn from a fixed robability distribution, the exeted frequeny E i of a harater in the substring is obtained by multilying the length of the substring l with the robability of ourrene of that harater. Hene, the exeted frequeny vetor is given by E lp, where P { 1,,..., k }. The hi-square (X ) definition in (4) an be further simlified as: X (Y i l i) l i Y i l i l " Y i l i Y i l and Y i + l # i 1 Note that the hi-square value for a substring deends only on the ount of the haraters in it, and not on the order in whih they aear. It an be seen in the oin toss examle that all the outomes that are less likely to our have higher X values than the observed outome. For multinomial models, under the null hyothesis, both X statisti and ln(lr) statisti onverge to the χ distribution with k 1 degrees of freedom [1, 4]. Hene, the -value of the outome an then be omuted using the umulative distribution funtion (df) F (x) of the χ (k 1) distribution. If z 0 is the X value of the observed outome, then its -value is 1 F (z 0). Moreover, it has also been shown that the X statisti onverges to the χ distribution from below as oosed to the ln(lr) i (5) statisti whih onverges from above [1, 4]. Thus, the hi-square statisti diminishes the robability of tye-i errors (false ositives). Considering these signifiant advantages, we adot the Pearson s X statisti as the estimate to quantify the statistial signifiane in our study. In this aer, we fous on the roblem where only ortions of the string instead of the whole string may deviate from the exeted behavior. As disussed in the exerimental setion, this roblem is artiularly useful in the analysis of temoral strings where an external event ourring in the middle of a string may be ausing the artiular substring to deviate signifiantly from the exeted behavior by inflating or deflating the robabilities of ourrene of some haraters in the alhabet. Our work fouses on the roblem of identifiation of suh statistially signifiant substrings in large strings. Before venturing forward, we formally define the different roblem statements handled in this aer for a string S of length n. PROBLEM 1 (MOST SIGNIFICANT SUBSTRING). Find the most signifiant substring (MSS) of S, whih is the substring having the highest hi-square value (X ) among all ossible substrings. PROBLEM (TOP-T SUBSTRINGS). Find the to-t set T of t substrings suh that T t and for any two arbitrary substrings S 1 T and S T, XS 1 XS. PROBLEM 3 (SIGNIFICANCE GREATER THAN THRESHOLD). Find all substrings having hi-square value (X ) greater than a given threshold α 0. PROBLEM 4 (MSS GREATER THAN GIVEN LENGTH). Find the substring having the highest hi-square value (X ) among all substrings of length greater than γ 0. The rest of the aer is organized as follows. Setion rovides an overview of the related work. Setion 3 formulates some imortant definitions and observations used by our algorithm. Setion 4 desribes the algorithm for finding the MSS of a string. Setion 5 resents the analysis of the algorithm. Setion 6 extends the MSS finding algorithm to the more general roblems. Setion 7 shows the exerimental analysis and some aliations of the algorithm on real datasets. Finally, Setion 8 disusses ossible future work.. RELATED WORK The roblem of identifying frequent and statistially relevant subsequenes (not neessarily ontiguous) in a sequene has been an ative area of researh over the ast deade [19]. The roblem of finding statistially signifiant subsequenes within a window of size w has also been addressed [3, 15]. Sine the number of subsequenes grows exonentially with w, the task of omuting subsequenes within a large window is ratially infeasible. We address a different version of the roblem where the window size an be arbitrarily large but statistially signifiant atterns are onstrained to be ontiguous, thus forming substrings of the given string. The roblem has many relevant aliations in laes where the extraneous fator that triggers suh unexeted atterns our ontinuously over an arbitrarily large eriod in the ourse of a sequene, as in the ase of temoral strings. As the ossible number of substrings redues to O(n ), the roblem of omuting statistially signifiant atterns beomes muh more salable. However, it is still omutationally intensive for large data. The trivial algorithm roeeds by heking all O(n ) ossible substrings. Some imrovements suh as bloking tehnique and hea strategy were roosed, but they showed no asymtoti imrovement in the time omlexity []. Two algorithms, namely, 53

3 ARLM and AGMM, were roosed whih use loal maxima to find the MSS [9]. It was laimed (only through a onjeture and not a roof) that ARLM would find the MSS. However, the time omlexity is still O(n ) with only onstant time imrovements. AGMM was a O(n) time heuristi that found a substring whose X value was roughly lose to the X value of MSS, but no theoretial guarantees were rovided on the bound of the aroximation ratio. The omarative analysis of our algorithms with them is shown in detail in Setion 7. To the best of our knowledge, no algorithm exists till date that exatly finds the MSS or solves the other variants of the roblem in better than O(n ) time. It may seem that a fast algorithm an be obtained using the suffix tree [14]. However, the roblem at hand is different. To omute the X value of any substring we need not traverse the whole substring; rather, we just need the number of ourrenes of eah harater in that substring. This an be easily omuted in O(1) time by maintaining k ount arrays, one for eah harater of the alhabet, where i th element of the array stores the number of ourrenes of the harater till i th osition in the string. Eah array an be reroessed in O(n) time. Furthermore, due to omlex non-linear nature of the X funtion we assume that no obvious roerties of the suffix trees or its invariants an be utilized. The trivial algorithm heks for all ossible substrings that have O(n) starting ositions and for eah starting osition have O(n) ending ositions, thus requiring O(n ) time. Our algorithm also onsiders all the O(n) starting ositions, but for a artiular starting osition, it does not hek all ossible ending ositions. Rather, it skis ending ositions that annot generate andidates for the MSS or the to-t set. We show that for a artiular starting osition, we hek only O( n) different ending ositions, thereby sanning a total of only O(n 3/ ) substrings. We formally show that the running time of our algorithm is O(n 3/ ). We also extend the algorithm for finding the to-t substrings and other variants, all of whih, again, run in O(n 3/ ) time. 3. DEFINITIONS AND OBSERVATIONS In the rest of the aer, any string S over a multinomial alhabet Σ {a 1, a,..., a k } and drawn from a fixed robability distribution P { 1,,..., k } is hrased as S over (Σ, P ). For a given string S of length n, S[i] (1 i n) denotes the i th letter of the string S and S[i... j] denotes the substring of S from index i to index j, both inluded. So, the omlete string S an also be denoted by S[1... n]. DEFINITION 1 (CHAIN COVER). For any string S of length l, a string λ(s, a i, l 1) of length l + l 1 is said to be the hain over of S over l 1 symbols of harater a i if S is the refix of λ(s, a i, l 1) and the last l 1 ositions of λ(s, a i, l 1) are ouied by the harater a i. Alternatively, λ(s, a i, l 1) is of the form S followed by l 1 ourrenes of harater a i. For examle, if S dbb then λ(s, d, 3) dbbddd, and if S baad then λ(s, a, ) baadaa. We first rove that for any string S of length l, X value of any string S of length less than or equal to l + l 1 and having S as its refix is uer bounded by the X value of a hain over of S over l 1 symbols of some harater a i Σ. A suffix tree is a data struture that an be built in θ(n) time. The ower of suffix trees lies in quikly finding a artiular substring of the string. It rovides a fast imlementation of many imortant string oerations. LEMMA 1. Let S be any given string of length l over (Σ, P ) with ount vetor denoted by {Y 1, Y,..., Y k } where eah Y i 0 and P k Yi l. Let S be any string whih has S as its refix and is of length l + l 1. Then there exists some harater a j Σ suh that X value of S is uer bounded by the X value of the over string λ(s, a j, l 1). The harater a j is suh that it has the maximum value of Y j +l 1 among all j {1,,..., k}. PROOF. Let the X values of strings S, S and λ(s, a j, l 1) be denoted by XS, X S and X λ resetively. We need to rove that X S X λ. By definition, the ount vetor of λ(s, a j, l 1) is {Y 1, Y,..., Y j+ l 1,... Y k }. Further, let Y i denote the frequeny of harater a i in S that are not resent in S (i.e., frequeny of a i in the l 1 length suffix of S ). So, the ount vetor of S is {Y 1 +Y 1, Y +Y..., Y k + Y k} where eah Y m 0 and P k Y m l 1. From the definition of X statisti given in (5), we have X S X λ,m j X S Y m l l (6) Ym (Yj + l1) + (l + l 1) (l + l 1) (l + l 1) Y m (l + l 1) + Yjl1 + l 1 (l + l 1) (l + l 1) (7) (Y m + Y m) (l + l 1) (l + l 1) Y m (l + l 1) + Y my m + Y m (l + l 1) (l + l 1) (8) The harater a j is hosen suh that it maximizes the quantity Y j +l 1 over all ossible alhabets. So for any other harater a m where m {1,,..., k} we have Y m + Y m Ym + l1 Multilying (9) by Y m and summing it over m we get Y my m + Y m From (7), (8) and () we have X S Yj + l1 (9) Y m Y j + l 1 Yjl1 + l 1 () Y m (l + l 1) + Yjl1 + l 1 (l + l 1) (l + l 1) X λ. The next lemma states that the X value of a string an always be inreased by adding a artiular harater to it. LEMMA. Let S be any given string of length l over (Σ, P ) with ount vetor denoted by {Y 1, Y,..., Y k } where eah Y i 0 and P k Yi l. There always exists some harater aj suh that by aending it to S, the X value of resultant string S beomes greater than that of S. The harater a j is suh that it has the maximum value of Y j among all j {1,,..., k}. 54

4 PROOF. Let the X values of strings S and S be denoted by X S and X S resetively. We need to rove that X S < X S. The string S is the resultant string obtained by aending alhabet a j to the string S, so the ount vetor of S is {Y 1,..., Y j + 1,..., Y k }. From (5), we have X S X S From (11) and (1) we have Y m l l (11) Y m (l + 1) + Yj + 1 (l + 1) (l + 1) (1) X S X S Ym + Yj + 1 (l + 1) (l + 1) (l + 1) " 1 (Y j + 1)l Ym l(l + 1) l(l + 1) # Y m l + l (13) The harater a j is hosen suh that it maximizes Y j over all j. So we have Y m Yj m {1,,..., k} (14) Multilying (14) by Y m and summing it over m we get Y m Yj Y m lyj (15) Putting (15) into (13) we get X S 1 h i X S (Y j + 1)l l(l + 1) ly j l(l + 1) 1 h i l(y j l) + l(1 ) (16) l(l + 1) Again, from (14) we have: Y m Y j Y m Y j l Y j (17) Putting (17) into (16) and using < 1, we get X S X S > 0. In the next result, we show that the X value of any string S having S as its refix is uer bounded by X value of the hain over of S. THEOREM 1. Let S be any given string of length l over (Σ, P ) with ount vetor denoted by {Y 1, Y,..., Y k } where eah Y i 0 and P k Yi l. Further, let S be any string whih has S as its refix and is of length less than or equal to l + l 1. Then there exists some harater a j Σ suh that X value of S is less than λ(s, a j, l 1). The harater a j is suh that it has the maximum value of Y j +l 1 among all j {1,... k}. PROOF. The roof follows diretly from the results stated in Lemma 1 and Lemma. From Lemma, we an say that there always exists a harater suh that aending it inreases the X value of S. Hene, we kee aending the string S with suh Algorithm 1 Algorithm for finding the most signifiant substring (MSS) 1: Xmax 0 : for i n to 1 do 3: for l 0 to n i do 4: l i + l 5: Xl X value of S[i... l ] 6: if Xl > Xmax then 7: Xmax Xl 8: end if 9: t m s.t. m {1,,..., k}, Ym+x is maximum : a 1 t 11: b Y t l t txmax 1: (Xl Xmax)l t 13: x b+ b 4a a 14: Inrement l by x 15: end for 16: i i 1 17: end for 18: return Xmax haraters till its length beomes l + l 1. We all the resultant string S. Clearly, S has S as its refix and is of length l + l 1 and X value of S is less than or equal to X value of S. The harater a j is suh that maximizes Y j +l 1 over all j {1,,..., k}; so using Lemma 1, we an say that the X value of S is less than the X value of λ(s, a j, l 1). This further imlies that X value of S is less than or equal to the X value of λ(s, a j, l 1). We next formally desribe our algorithm for finding the most signifiant substring (MSS). 4. THE MSS ALGORITHM The algorithm looks for the ossible andidates of MSS in an ordered fashion. The seudoode is shown in Algorithm 1. The loo in line iterates over the start ositions of the substrings while the loo in line 3 iterates over all the ossible lengths of the substrings from a artiular start osition. We kee trak of the maximum X value of any substring omuted by our algorithm by storing it in a variable X max. For a given substring S[i... l ], we alulate its X value, whih is stored in X l (line 5). If X l turns out to be greater than X max then X max is udated aordingly (line 7). The harater a t is hosen suh that it maximizes the value of Y j +x over all j (line 9 of the seudoode). This roerty is neessary for the aliation of the result stated in Theorem 1. Denoting the X value of a hain over of S[i... l ] over x symbols of harater a t by Xλ, the result stated in Theorem 1 states that the X value of any substring of the form S[i... (l + m)] for m {0, 1,..., x} is uer bounded by Xλ. We hoose x suh that it is maximized within the onstraint that Xλ is guaranteed to be less than or equal to Xmax. Then, under the given onstraint, we an ski heking all substrings of the form S[i... (l +m)] for m {0, 1,..., x} as their X values are not greater than Xmax. So, we diretly inrement l by x (line 14). Next, we find out what the ideal hoie of x is. We denote the ount vetor of substring S[i... l ] of length l by {Y 1, Y,..., Y k }. The ount vetor of over hain is given by {Y 1, Y..., Y t + x,..., Y k } where Y t denotes the frequeny of 55

5 harater a in the algorithm. By definition of X from (5), and X λ X l l(x l + l) (l + x) (Y m) l l (18) (Y m) xyt + x + (l + x) (l + x) (l + x) t + xyt + x (l + x) t (l + x) (19) We want to maximize x with the onstraint that Xλ Xmax. From (19) we have, l(x l + l) (l + x) + xyt + x (l + x) t (l + x) X max (0) On multilying (0) by (l + x) t and rearranging, the onstraint simlifies to (1 t)x + (Y t l t tx max)x + (X l X max)l t 0 (1) Eq. (1) is a quadrati equation in x with a 1 t > 0, b Y t l t tx max and (X l X max)l t 0 (X l X max). We need to maximize x with the onstraint that ax + bx + 0. Thus, we hoose x as the ositive root of the quadrati equation: x b + b 4a a () Sine a > 0 and 0 we have x 0. Further, sine x has to be an integer we hoose x as the greatest integer greater than or equal to the above value (line 13 of the algorithm). 5. ANALYSIS OF THE MSS ALGORITHM We first show that the running time of the algorithm on an inut string generated from a memoryless Bernoulli model is O(kn 3/ ) with high robability where n and k denote the string and alhabet size resetively. For a string not generated from the null model, we will argue that the time taken by our algorithm on that string is less than the time taken by our algorithm on an equivalent string of the same size generated from the null model. Hene, the time omlexity of our algorithm for any inut string is O(kn 3/ ) with high robability. Let S be any string drawn from a memoryless Bernoulli model. Let T ij denote the random variable that takes value 1 if a i ours at osition S[j] and 0 otherwise. Eah harater of the string S is indeendently drawn from a fixed robability distribution P, so the robability that T ij 1 is i. The frequeny of harater a i in the string S denoted by the random variable Y i is the sum of n Bernoulli random variables T ij where j ranges from 1 to n. Sine Y i is the sum of n i.i.d. (indeendent and identially distributed) Bernoulli random variables, eah having a suess robability i, Y i follows a binomial distribution with arameters n and i. Y i T ij Bernoulli( i) nx T ij Y i Binomial(n, i) (3) j1 We state the following two standard results from the domain of robability distributions. THEOREM. For large values of n, the Binomial(n,) distribution onverges to Normal(µ,σ ) distribution with the same mean and variane, i.e., µ n and σ n(1 ). PROOF. The roof uses the result of Central Limit Theorem. Please refer to [4] for the detailed roof. 3 It has been shown in [1] that for both n and n greater than a onstant 4, the binomial distribution an be aroximated by the normal distribution. Sine all the robabilities i in our setting are fixed, we an always find a onstant (say ) suh that for all n greater than, every X i N(n i, n i(1 i)) distribution. We use the following result to obtain the distribution of the X statisti of any substring from a string generated using the null model. THEOREM 3. Let the random variable Y i, i {1,... k} follows N(n i, n i(1 i)) distribution with P k i 1 and the additional onstraint that P k Yi n. The random variable X (Y i n i) (4) n i then follows the hi-square distribution with (k 1) degrees of freedom, denoted by χ (k 1). PROOF. It has to be noted that all Y i s in the theorem are not indeendent but have an added onstraint that P k Yi n. This is reisely the reason why the degrees of freedom of hi-square distribution is k 1 instead of k. A well known result is that the sum of squares of n indeendent standard normal random variables follows a χ (k) distribution. The roof (whih is slightly omliated) follows diretly from this well known result. Please refer to [0] for the detailed roof. We will next rove that with high robability, the X value of the MSS of S generated using the null model is greater than ln n. However, before that, we rove another useful result using elementary robability theory. LEMMA 3. Let Z max denote the maximum of m i.i.d. random variables following χ (k) distribution. Then with robability at least 1 O(1/m ), for suffiiently large m and for any onstant > 0, ln m Z max. PROOF. We first show this for k. Let f(x) and F (x) denote the df and df of χ () distribution: We have f(x; ) 1 e x/ F (x; ) 1 e x/ (5) Z max max{z 1, Z,..., Z m} i, Z i χ (k) (6) For any onstant > 0 we have: P r{z max > ln m} P r{ i, s.t. Z i > ln m} 1 P r{ i, Z i ln m} 1 (P r{z i ln m}) m 1 (1 e 1 ln m ) m 1 (1 1 m ) m 1 e m/ 1 O(1/m ) (7) In the above roof we only utilized the asymtoti behavior of df and df of the χ (k) distribution. Sine for any general k, 3 In the above aroximation, we an think of the binomial distribution as the disrete version of the normal distribution having the same mean and variane. So we do not need to aount for the aroximation error using the Berry-Esseen theorem [8]. 4 In general, the value of this onstant is taken as 5 [1]. 56

6 the asymtoti behavior of df and df of χ (k) distribution has the same dominating term e x/, the above result is valid for any given k. 5 LEMMA 4. In the MSS algorithm, at any iteration in the loo over i, X max > ln n with robability at least 1 O(1/n ) where n n i. PROOF. We an verify from the seudo ode (Algorithm 1) that before we begin the loo in line for i i 0, we have heked all the substrings that are otential andidates for MSS of S starting at i > i 0. So, at this instane, the variable X max stores the maximum X value of any substring of the string S[(i 0 +1)... n]. In other words, the variable X max would store the maximum of n C O(n ) (where n n i 0) random variables eah following the same χ (k 1) distribution. However, sine these O(n ) substrings are not mutually indeendent, the result of Lemma 3 annot be diretly alied in this ase. However, we an still say that a subset of at least O(n ) substrings are indeendent, with eah substring following a χ (k 1) distribution. One way of onstruting a mutually indeendent subset of size O(n ) is by hoosing n / substrings eah of length suh that they do not share any harater among them, i.e., the i th substring in this set is S[(i )... (i 1))] where is a onstant suh that the binomial distribution an be aroximated by the normal distribution for all strings of length greater than or equal to. Sine all haraters of the string S are drawn indeendently from a fixed robability distribution, all the substrings in the subset are mutually indeendent, and sine length of all these substrings are greater than, X statistis of these substrings follow the χ (k 1) distribution. Consequently, the value of X max in our algorithm is greater than the max of at least O(n ) χ (k 1) i.i.d. random variables. Putting the value of m n / in the result of Lemma 3, we an rove the above result. LEMMA 5. On an inut string generated from the null model, with high robability (> 1 ǫ for any onstant ǫ > 0) the number of substrings skied (denoted by x) in any iteration of the loo on l in the MSS algorithm is ω( l) for suffiiently large values of l. Hene, ǫ an be set so lose to 0 that with robability ratially equal to 1, the number of substrings skied x in any iteration is at least ω( l). PROOF. As stated in (), the number of substrings skied in any iteration of the loo on l is x b + b 4a (8) a We will rove that in the string generated from the null model, with high robability b 1 lt ln l and 1 lt ln l. These bounds hel us in guaranteeing that x ω( l) with high robability. In order to rove the bounds on b and, we first rove that the following onditions hold with high robability. (i) From the result stated in Lemma 4, for any onstant ǫ 1 > 0, we have with robability at least 1 O(1/n ) > 1 ǫ 1 that X max > ln n where n n i. In the algorithm, l in the loo iterates from 0 to n i, so we have l n. Hene, X max > ln l with robability at least 1 ǫ 1. (ii) Suose Y t denotes the frequeny of alhabet a t in the string S[i... l ] of length l. As denoted in (3) it is the sum of l 5 The term of x k/ 1 e x/ ourring in df of a general k is asymtotially less than e x/+ǫ and greater than e x/ ǫ for any ǫ > 0, whih is indeendent of k. indeendent Bernoulli random variables T ij eah with exetation t; so, E[Y t] l t. Also, we have P r(t ij [0, 1]) 1. Now, using the Hoeffding s inequality [16], we get P r{y t E[Y t] < t} 1 e t l P l (b i a i ) (9) Substituting E[Y t] l t, t 1 4 lt ln l, a i 0 and b i 1, we have for any onstant ǫ > 0 P r{y t l t < 1 4 lt ln l} 1 e l t ln l 16l 1 l t 8 1 ǫ (30) (iii) As stated in Theorem 3, the X value of substring S[i... l ] of length l denoted by Xl follows the χ distribution. Further, using the definition of df of χ distribution denoted by F x, we have for any onstant ǫ 3 > 0 P r{x l < ln l ln l } Fx( ) 1 ln l e 4 1 ǫ 3 (31) We hoose onstants ǫ 1, ǫ and ǫ 3 small enough suh that for any onstant ǫ > 0, 1 ǫ 1 ǫ ǫ 3 > 1 ǫ. Thus ombining the above three onditions, the following results hold with robability 1 ǫ: b (Y t l t) tx max (Y t l t) 1 lt ln l (3) l t(xl Xmax) l 1 t( ln l ln l) 1 lt ln l (33) a 1 t 1 (34) We use the fat that if any ositive x satisfies the equation a x + b x + 0 then it also satisfies the equation ax + bx + 0 if a a, b b and. So substituting uer bounds of a, b and in (8) and maximizing x in (8) we have with robability 1 ǫ x 1 ( r 1 4 lt ln l + lt ln l 1 lt ln l) 1 ( r 9 4 lt ln l 1 lt ln l) 1 lt ln l Ω( l ln l) ω( l) (35) Further, in Algorithm 1, exet line 9, all the stes inside the loo over l in line 3 an be erformed in onstant time. However, if we an determine the frequenies of all of the haraters in the substring S[i... l ] in O(1) time, then we an find the harater a t (line 9) in O(k) time. For this urose, we maintain one ount array for eah harater a t, t 1,..., k, where the i th element of the ount array stores the number of ourrene of a t u to the i th osition in the string. Eah ount array an be reroessed in O(n) time. Consequently, eah iteration of the loo over l in line 3 takes O(k) time. Further the loo over i in line iterates n times. Now, we only need to omute the number of iterations of the loo over l for whih we use the next lemma. LEMMA 6. The exeted number of iterations of the loo on l (in line 3 of the MSS algorithm) for eah value of i is O( n). 57

7 Algorithm Algorithm for finding the to-t substrings 1: T Min Hea on t elements all initialized to 0 : for i n to 1 do 3: for l 0 to n i do 4: l i + l 5: Xmax t Find Min(T) 6: Xl X value of S[i... l ] 7: if Xl > Xmax t then 8: Extrat Min(T ) 9: Insert Xl in T : end if 11: t m s.t. m {1,,..., k}, Ym+x is maximum 1: a 1 t 13: b Y t l t txmax t 14: (Xl Xmax t)l t 15: x b+ b 4a a 16: Inrement l by x 17: end for 18: i i 1 19: end for 0: return T ı Algorithm 3 Algorithm for finding all substrings having X value greater than α 0 1: S α0 φ : for i n to 1 do 3: for l 0 to n i do 4: l i + l 5: Xl X value of S[i... l ] 6: if Xl > α 0 then 7: S α0 S α0 S[i... l ] 8: end if 9: t m s.t. m {1,,..., k}, Ym+x is maximum : a 1 t 11: b Y t l t tα 0 1: (Xl α 0)l t 13: x max j b+ 14: Inrement l by x 15: end for 16: i i 1 17: end for 18: return S α0 b 4a a ı ff, 1 PROOF. Let T (r) be the number of iterations of the loo over l required for l to reah r. We have shown in Lemma 5 that in eah iteration, the number of substrings skied x is ω( l). Thus, l in the next iteration will reah from r to r + ω( r). This gives us the following reursive relation: T (r + r) T (r) + O(1) T (r) + q (36) It an be shown that the solution to the above relation is O( n). Please refer to Lemma 7 in the aendix for detailed roof. Sine eah iteration of the loo over l in line 3 takes O( n) time, the time taken by the algorithm on an inut string generated by the null model is O(kn 3/ ) whih is O(n 3/ ) sine k is taken as a onstant in our roblem setting. Thus, we have shown that the running time of the algorithm on an inut string generated from a memoryless Bernoulli model is O(kn 3/ ) with high robability. 5.1 Nature of the String As it an be verified from the definition, the X value of a substring inreases when the exeted and observed frequenies begin to diverge. Thus, the individual substrings of a string not generated from the null model are exeted to have higher X values whih, in turn, inreases the Xmax. Further, it an be verified from () that the number of substrings skied, x, inreases on inreasing Xmax as we have to maximize x suh that the onstraint Xλ Xmax is satisfied. If Xmax is large, it gives a larger window for Xλ whih allows the hoie of a larger x. Hene, the time taken by our algorithm on an inut string not generated from null model is less than the time taken by our algorithm on an equivalent string of the same size generated from the null model. So, the time omlexity of our algorithm remains O(n 3/ ) and is indeendent of the nature of the inut string. Setion 7.1. gives the details on how our algorithms erform on different tyes of strings. 6. OTHER VARIANTS OF THE PROBLEM 6.1 To-t Substrings The algorithm for finding the to-t statistially signifiant substrings (Algorithm ) is same as the algorithm for finding the MSS exet that Xmax t stores the t th largest X value among all substrings seen till that artiular instant by the algorithm. We maintain a min-hea T of size t for storing the to-t X values seen by the algorithm. The hea T is initially emty and Xmax t always stores the to (minimum) element of the hea. If Xl is omuted to be greater than Xmax, then we extrat the minimum element of T (whih now no more is a art of to-t substrings) and insert the new Xl value into the hea. Now, Xmax t oints to the new minimum of the hea. Finally, at the end of the algorithm we return the hea T whih ontains the to-t X values among all the substrings of string S. The analysis of this algorithm is same as the algorithm for MSS exet that we now need to show that Xmax t is greater than ln n with robability greater than any onstant. This still holds true for any t < ω(n) (lease refer to Lemma 8 in the aendix for detailed roof). Moreover, inside the for loo on l, we now erform insertion and extrat-min oerations on a hea T of size t; so eah iteration of the loo over l now requires O(k + log t) time. Thus, the total time omlexity of the algorithm for finding the to t substrings is O((k + log t)n 3/ ) for t < ω(n). 6. Signifiane Greater Than a Threshold The algorithm for finding all substrings having X value greater than a threshold α 0 (Algorithm 3) is again essentially the same as the MSS algorithm exet that the X max onstantly remains α 0 at every iteration. We maintain S α0 as a set of all substrings having X value greater than α 0. We ski all substrings that annot be a art of S α0, i.e., whose over strings have X value not greater than α 0. Next, we analyze the time omlexity of the algorithm on varying α 0. We again revert to (): j b + b 4a x max a ı ff, 1 (37) where a 1 t > 0, b Y t l t tα 0 and (Xl α 0)l t 0. If α 0 < Xl then in the above equation is ositive. Consequently, as x takes the value 1, the number of iterations of the loo on l is O(n). Hene, the time omlexity of the algorithm is O(kn ). However, the time omlexity dereases sharly on 58

8 4 0 Our Algorithm Trivial Algorithm O(n 1.5 ) Our Algorithm Ln Iter Ln X max Ln n (a) Number of iterations with string length n (k) Ln n Figure : Variation of X max with string length n (k ). Ln Iter k k3 k5 k Iter/X max S 1 :X max S 1 :Iterations(in 4 ) S :X max S :Iterations(in 4 ) Ln n (b) Number of iterations with alhabet size k. Figure 1: Analysis of time omlexity for finding the MSS. inreasing α 0. One α 0 beomes suffiiently greater than X l, the term α 0l t starts redominating b, and x in eah ste is effetively /a whih is O( αl). 6 Hene, the reurrene relation of the number of iterations of the loo on l in this ase is T (l + O( α 0l)) T (l) + 1 (38) It an be again shown with the hel of Lemma 7 in the aendix that the solution to the reursive relation is O( l/α 0). So the total time omlexity of the algorithm is O(kn n/α 0). 6.3 MSS Greater Than a Given Length The algorithm for finding the most signifiant substring among all substrings having length greater than a given length Γ 0 is exatly the same as the MSS algorithm exet that now we ignore any substring whose length is not greater than Γ 0. This means the loo on l starts with Γ 0 instead of 0 and loo on i goes on till n Γ 0 instead of n. The time omlexity of the algorithm dereases not just beause of less number of substrings evaluated in this ase but also beause the ski x in our algorithm is a funtion of l and it inreases with inreasing values of l. Hene, the reursive relation for the loo over l in this ase is the same with only the base ase different: T (Γ 0) 1 instead of T (1) 1. The solution to this reurrene relation is O( n Γ 0). Sine there are n Γ 0 iterations of loo in i, the total time omlexity of the algorithm is O(k(n Γ 0)( n Γ 0)) whih is effetively O(kn 3/ ). 6 In a substring generated from a memoryless Bernoulli distribution, X follows a χ distribution with onstant mean and variane. Hene, it an be shown with high robability that Xl is a small onstant Figure 3: X max and number of iterations for different multinomial strings. S 1 : n 4, k 3, P { 0, 0.5 0, 0.5}; S : n 4, k 5, P { 0, 0.5 0, 0.1, 0., 0.}. 7. EXPERIMENTAL ANALYSES AND AP- PLICATIONS The exerimental results shown in this setion are for C odes run on Maintosh latform on a mahine with.3 GHz Intel dual ore roessor and 4 GB, 1333 MHz RAM. Eah harater of a syntheti string was generated indeendently from the underlying distribution assumed using the standard uniform (0, 1) random number generator in C. 7.1 Syntheti Datasets Time Comlexity of Finding MSS The first exeriment is on the time omlexity of our algorithm for finding the most signifiant substring. Figure 1a deits the omarison of number of iterations required by our algorithm visà-vis the trivial algorithm for inut strings of different lengths (n) generated from the null model for an alhabet of size. The number of iterations of our algorithm when lotted on a logarithmi sale inreases linearly with the logarithm of the string size with a sloe lose to 1.5. Hene, we an laim that the emirial time omlexity of our algorithm for an inut string generated by null model is also O(n 1.5 ). The effet of varying the alhabet size is shown in Figure 1b for different string lengths. It an be observed that, as exeted, varying the alhabet size has no signifiant effet on the number of iterations of the algorithm. Figure shows that the exeted X max inreases linearly with n with sloe whih suorts our laim in Lemma 4 that for suffiiently large n, X max is greater than ln n with high robability. Finally, Figure 3 lots the variation of X max and iterations of the loo over l for different heterogeneous multinomial distributions 59

9 Iterations in Million Iterations in Million Null Geometri Zaian Markov ,000 0,000 50, String Length (a) Varying n (k 5). Null Geometri Zaian Markov Alhabet size (b) Varying k (n 0000). Figure 4: Comarison of time taken by our algorithm on strings not generated by the null model. and different alhabet sizes. It is evident that hange in the robability 0 of ourrene of harater a 0 only hanges the X max but has no signifiant effet on the number of iterations taken by our algorithm. It an be intuitively seen that the hange in 0 is effetively aneled out by the hange in X max, so the number of haraters skied (x in Eq. ()) roughly remains the same Strings Not Generated Using the Null Model We now investigate the results for inut strings not generated from the null model in addition to an equivalent length inut string generated from the null model whih is a memoryless Bernoulli soure where the multinomial robabilities of all the haraters are equal. The different tyes of other strings that we omare are: (a) Geometri string: A string generated from a memoryless multinomial Bernoulli soure but the multinomial robabilities of all the haraters are different. The robability of ourrene of a harater dereases geometrially. Hene, the robability of ourrene of harater a i is roortional to 1/ i. (b) Harmoni string: A string generated from a memoryless multinomial Bernoulli soure but the multinomial robabilities of all the haraters are different. The robability of ourrene of a harater dereases harmonially. Hene, the robability of ourrene of harater a i is roortional to 1/i. () Markov string: A string generated by a Markov roess, i.e., the ourrene of a harater deends on the revious harater. The state transition robability of harater a j following harater a i is roortional to 1/ (i j) mod k. The number of iterations for our algorithm on different values of string length (n) and alhabet size (k) are lotted in Figure 4. It an be verified that in all the ases, the string generated using the null model requires the maximum number of iterations whih Ln Time (in µs) Ln Time (in µs ) MSS To- To-0 To Ln n (a) Number of iterations with string length n. n500 n000 n Ln t (b) Number of iterations with t. Figure 5: Analysis of time omlexity for finding the to-t set. is in aordane with our theoretial laim in Setion 5. The time taken by our algorithm on an inut string not generated from a null model is uer bounded by the time taken on an equivalent size inut string generated from the null model. This verifies that the time omlexity of our algorithm is O(kn 3/ ), indeendent of the tye of the inut string. 7. Other Variants 7..1 To-t Signifiant Substrings The time taken by the algorithm for finding the to-t set on varying string lengths for different values of t is shown in Figure 5a. The linear inrement in logarithmi sale with sloe 1.5 verifies that for any onstant t the time taken by our algorithm to find the to-t set is again O((k + log t)n 1.5 ). The time taken for different t is shown in Figure 5b. The lot shows that till t < ω(n), the running time inreases with sloe 1.5, but one t rosses the limit, the sloe starts inreasing towards. This is agreement with our theoretial analysis in Setion Signifiane Greater Than a Threshold Figure 6 deits the number of iterations taken by the algorithms for finding all substrings greater than a threshold α 0. As disussed in Setion 6., the iterations derease very sharly from O(n ) until α 0 O(X max) after whih it gradually dereases (as a funtion of 1/ α 0) Substrings Greater Than a Given Length The number of iterations taken by the algorithms for finding the MSS among all strings of length greater than Γ 0 is shown in Figure 7. As disussed in Setion 6.3, the number of iterations slowly dereases as Γ 0 tends to n before raidly aroahing 0. 60

10 Ln Iter Trivial Algorithm Our Algorithm α 0 Figure 6: Number of iterations with α 0 (n 5, k ). Ln Iter Trivial Algorithm Our Algorithm Ln Γ 0 Figure 7: Number of iterations with Γ 0 (n 5, k ). 7.3 Comarison with Existing Tehniques Table 1 resents the omarative results of our algorithm with the existing algorithms [13] for two different values of string size (averaged over different runs). As exeted, results indiate that ARLM [13], being O(n ), does not sale well for larger strings, as oosed to our algorithm. AGMM [13], being O(n) time, is very fast and outerforms all the algorithms in terms of time taken. However, being just a heuristi with no theoretial guarantee, it does not always lead to a solution that is lose to the otimal. As an be verified from Table 1, the average X max string found by AGMM is signifiantly lower than the average X max value found by other algorithms. Further, sine there are no guarantees on the lower bound of the X max value found by it relative to the otimal X max value, AGMM an lead to retty bad solutions in some real datasets whih are not as well behaved as the syntheti ones (Setion 7.5). Finally, our algorithm requires only 3 seonds for a string as large as of length whih signifies that for real life senarios, the algorithm is ratial. 7.4 Aliation in Crytology The orrelation between adjaent symbols is of entral imortane in many rytology aliations [1]. The objetive of a random number generator is to draw symbols from the null model. The indeendene of onseutive symbols is an imortant riterion for effiieny of a random number generator [1]. We define orrelation between adjaent symbols in terms of the state transition robability. An ideal random binary string generator should generate the same symbol in next ste with robability exatly 0.5. However, some random number generators whih are ineffiient might be biased towards generating the same symbol again with robability more than 0.5. Table shows the omarison of X max for different lengths n of string and different robabilities of generation of same symbol in the next iteration. Algo String Size Avg Xmax Avg Time Trivial s Our s ARLM s AGMM s Trivial s Our s ARLM s AGMM s Table 1: Comarison with other tehniques for syntheti datasets. Xmax n n n n Table : Variation of X max with n and. It an be verified from the data that the X max is minimum for a string generated with 0.5 and inreases with inreasing. Further, Figure 4 lots the variation of X max of a string generated using the null model with (logarithm of) the string length (ln n). We observe a nie linear onvergene with sloe. This X max value an be used as a benhmark for a string of any length to measure the deviation from the null model. If the observed X max value of a string deviates signifiantly from the benhmark, it means that the string generated is not omletely random but ontains some kind of hidden orrelation among the symbols. One of the major advantages of using the algorithm is in a senario where only a substring of a string might deviate from the random behavior. Our algorithm will be able to ature suh a substring without having to examine all the ossible substrings Real Datasets Analysis of Sorts Data The hi-square statisti an be used to find the best or worst areer athes of sorts teams or rofessionals. Boston Red Sox versus New York Yankees is one of the most famous and fierest rivalries in rofessional sorts [11]. They have ometed against eah other in over two thousand Major League Baseball games over a eriod of 0 years. Yankees have won 113 (54.7%) of those games. However, we would like to analyze the time eriods in whih either of Yankees or Red Sox were artiularly dominant against the other. The dominant eriods should have large win ratio for a team over a suffiiently long streth of games. If we enode the results in the form of a binary string whose letters denote a win or loss for a team, then these suffiiently long eriods will ontain results that signifiantly differ from the exeted or average. Consequently, the X value for the dominant eriods will signifiantly differ from 0. We use the dataset obtained from The to five most signifiant athes found by our algorithm have been summarized in Table 3. The best eriod for Yankees was from mid 190s to early 1930s in whih they won more than 75% of the games. It was learly the era of Yankees dominane in whih they won 6 World Series hamionshis and 39 ennants, omared to only 4 ennants for the Red Sox [11]. Alternatively, the best ath for Red Sox was a two-year eriod around 191 in whih they had lose to 90% winning reord; this is also referred to as the glory eriod in Red-Sox history [11]. 7 Suh substrings will tend to exhibit large X values and, hene, will be atured by our algorithm. 61

11 Start End X val Games Wins Win% % % % % % Table 3: Performane of Yankees against Red-Sox. Algorithm X val Start End Time Trivial s Our s ARLM s AGMM s Table 4: Comarison with other tehniques for sorts data. The omarative results of our algorithm with existing tehniques are summarized in Table 4. As exeted, our algorithm and AGMM finds the otimal solution but our algorithm outerforms the trivial algorithm and is almost as good as ARLM in terms of time (due to relatively small string size). Moreover, though AGMM is faster, it does not find the otimal solution. The best eriod found by AGMM was the seond best (see Table 3) and has a signifiantly lower X value Analysis of Stok Returns Most finanial models are based on the random walk hyothesis whih is onsistent with the effiient-market hyothesis [6]. They assume that the stok market ries evolve aording to a random walk with a onstant drift and, thus, the ries of the stok market annot be redited. 8 We analyze the returns of three generi finanial seurities for whih a long historial data is available. The Dow Jones Industrial Average is one of the oldest stok market index that shows the erformane of 30 large ublily owned omanies in the United States. Similarly, S&P 500 is another large aitalization-weighted index that atures the erformane of 500 large-a ommon stoks atively traded. Finally, the IBM ommon stok is reresentative of one of the oldest and largest ublily owned firms. We run the algorithms on the Dow Jones ries obtained sine the year 198 onwards (0906 days), S&P 500 sine 1950 onwards (15600 days) and IBM sine 196 onwards (1517 days). The daily rie data are obtained from finane.yahoo.om. Given the randomness in the stok ries, we assume that the ries an inrease (or derease) eah day with a fixed robability. The fixed robability is alulated as the ratio of days on whih rie went u (or down) to the total number of trading days. We find the statistially signifiant substrings of the binary string enoded with 1 for the day if the rie of seurity went u and 0 otherwise. These substrings orresond to signifiantly long eriods that ontain a large ratio of days in whih the stok rie hanged. The results are summarized in Table 5. A lot of bad eriods ourred during the Great Deression of 1930s, the reent dot-om bubble burst and mortgage reession eriods of the last deade, whereas a number of good eriods ourred during the eonomi boom of 1950s and 1960s. These observations verify that these statistially signifiant eriods do not our just due to randomness or hane alone, but are onsequenes of external fators as well. The identifiation of suh signifiant atterns an hel in identifying the relevant external fators. Finally, the X values of these substrings an also be used in quantifying the historial risk of the seurities whih is one of the most imortant arameters that investment managers like to ontrol. 8 If the stok ries an be redited then there is an arbitrage in the market whih violates the effiient market hyothesis. Periods Seurity Start End Change Dow Jones % Dow Jones % Good S&P % S&P % IBM % IBM % Dow Jones % Dow Jones % Bad S&P % S&P % IBM % IBM % Table 5: Signifiant eriods for the seurities. Algo Se. X Start End Change Time Trivial Dow % 14.s Our Dow % 0.89s ARLM Dow % 4.15s AGMM Dow % 0.03s Trivial S&P % 9.36s Our S&P % 0.63s ARLM S&P %.87s AGMM S&P % 0.03s Table 6: Comarison with other tehniques for stok returns. The omarative erformane of our algorithm vis-à-vis the other tehniques in finding the eriod with the highest X value is summarized in Table 6. Again, as exeted, our algorithm, trivial algorithm and ARLM find the same eriod for whih the X value is maximized. However, in this ase, the time erformane advantage of our algorithm over ARLM is retty aarent. AGMM, though having the time advantage, does retty badly in terms of identifying the maximum X substring. Eseially for S&P 500, it returns a substring that is not even lose to the to few substrings. 8. CONCLUSIONS AND FUTURE WORK In this aer, we hose to analyze the X statisti in the ontext of a memoryless Bernoulli model. We exerimentally saw that for a string drawn from suh a model, the hi-square value of the most signifiant substring inreases asymtotially as ( ln n) where n is the length of the string. However, the rigorous mathematial roof remains an interesting oen roblem. Suh analysis of asymtoti behavior have signifiant aliations in deiding the onfidene interval with whih the null hyothesis is rejeted. Further, the analysis an be further extended to strings generated from Markov models, the most basi of whih being the ase when there is a orrelation between adjaent haraters. The single dimensional roblem of identifiation of the most signifiant substring an be extended to two-dimensional grid networks as well as general grahs. One otentially interesting aliation is in finanial time series analysis of two seurities that might not be very orrelated in general, but might oint to signifiant orrelations during ertain seifi events suh as reession. Suh orrelations are essential to most risk analysis tehniques. 9. REFERENCES [1] M. Abramowitz and I. Stegun. Handbook of Mathematial Funtions. Wiley, [] S. Agarwal. On finding the most statistially signifiant substring using the hi-square measure. Master s thesis, Indian Institute of Tehnology, Kanur, 009. [3] M. Atallah, R. Gwadera, and W. Szankowski. Detetion of signifiant sets of eisodes in event sequenes. In ICDM, ages 3,

CONSTRUCTION OF MIXED SAMPLING PLAN WITH DOUBLE SAMPLING PLAN AS ATTRIBUTE PLAN INDEXED THROUGH (MAPD, MAAOQ) AND (MAPD, AOQL)

CONSTRUCTION OF MIXED SAMPLING PLAN WITH DOUBLE SAMPLING PLAN AS ATTRIBUTE PLAN INDEXED THROUGH (MAPD, MAAOQ) AND (MAPD, AOQL) Global J. of Arts & Mgmt., 0: () Researh Paer: Samath kumar et al., 0: P.07- CONSTRUCTION OF MIXED SAMPLING PLAN WITH DOUBLE SAMPLING PLAN AS ATTRIBUTE PLAN INDEXED THROUGH (MAPD, MAAOQ) AND (MAPD, AOQL)

More information

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 11: Bayesian learning ontinued Geoffrey Hinton Bayes Theorem, Prior robability of weight vetor Posterior robability of weight vetor

More information

EconS 503 Homework #8. Answer Key

EconS 503 Homework #8. Answer Key EonS 503 Homework #8 Answer Key Exerise #1 Damaged good strategy (Menu riing) 1. It is immediate that otimal rie is = 3 whih yields rofits of ππ = 3/ (the alternative being a rie of = 1, yielding ππ =

More information

Risk Analysis in Water Quality Problems. Souza, Raimundo 1 Chagas, Patrícia 2 1,2 Departamento de Engenharia Hidráulica e Ambiental

Risk Analysis in Water Quality Problems. Souza, Raimundo 1 Chagas, Patrícia 2 1,2 Departamento de Engenharia Hidráulica e Ambiental Risk Analysis in Water Quality Problems. Downloaded from aselibrary.org by Uf - Universidade Federal Do Ceara on 1/29/14. Coyright ASCE. For ersonal use only; all rights reserved. Souza, Raimundo 1 Chagas,

More information

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 10: The Bayesian way to fit models Geoffrey Hinton The Bayesian framework The Bayesian framework assumes that we always have a rior

More information

Proportional-Integral-Derivative PID Controls

Proportional-Integral-Derivative PID Controls Proortional-Integral-Derivative PID Controls Dr M.J. Willis Det. of Chemial and Proess Engineering University of Newastle e-mail: mark.willis@nl.a.uk Written: 7 th November, 998 Udated: 6 th Otober, 999

More information

EE451/551: Digital Control. Relationship Between s and z Planes. The Relationship Between s and z Planes 11/10/2011

EE451/551: Digital Control. Relationship Between s and z Planes. The Relationship Between s and z Planes 11/10/2011 /0/0 EE45/55: Digital Control Chater 6: Digital Control System Design he Relationshi Between s and Planes As noted reviously: s j e e e e r s j where r e and If an analog system has oles at: s n jn a jd

More information

Panos Kouvelis Olin School of Business Washington University

Panos Kouvelis Olin School of Business Washington University Quality-Based Cometition, Profitability, and Variable Costs Chester Chambers Co Shool of Business Dallas, TX 7575 hamber@mailosmuedu -768-35 Panos Kouvelis Olin Shool of Business Washington University

More information

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1 Computer Siene 786S - Statistial Methods in Natural Language Proessing and Data Analysis Page 1 Hypothesis Testing A statistial hypothesis is a statement about the nature of the distribution of a random

More information

2.2 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES

2.2 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES Essential Miroeonomis -- 22 BUDGET-CONSTRAINED CHOICE WITH TWO COMMODITIES Continuity of demand 2 Inome effets 6 Quasi-linear, Cobb-Douglas and CES referenes 9 Eenditure funtion 4 Substitution effets and

More information

Fast, Approximately Optimal Solutions for Single and Dynamic MRFs

Fast, Approximately Optimal Solutions for Single and Dynamic MRFs Fast, Aroximately Otimal Solutions for Single and Dynami MRFs Nikos Komodakis, Georgios Tziritas University of Crete, Comuter Siene Deartment {komod,tziritas}@sd.uo.gr Nikos Paragios MAS, Eole Centrale

More information

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix rodut oliy in Markets with Word-of-Mouth Communiation Tehnial Appendix August 05 Miro-Model for Inreasing Awareness In the paper, we make the assumption that awareness is inreasing in ustomer type. I.e.,

More information

EXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL

EXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 3, August 2003,. 497 523 Printed in U.S.A. EXTENDED MATRIX CUBE THEOREMS WITH APPLICATIONS TO -THEORY IN CONTROL AHARON BEN-TAL, ARKADI NEMIROVSKI, and CORNELIS

More information

Hankel Optimal Model Order Reduction 1

Hankel Optimal Model Order Reduction 1 Massahusetts Institute of Tehnology Department of Eletrial Engineering and Computer Siene 6.245: MULTIVARIABLE CONTROL SYSTEMS by A. Megretski Hankel Optimal Model Order Redution 1 This leture overs both

More information

EM Decoding of Tardos Traitor Tracing Codes

EM Decoding of Tardos Traitor Tracing Codes Author manusrit, ublished in "ACM Multimedia and Seurity (2009)" EM Deoding of Tardos Traitor Traing Codes Teddy Furon Thomson Seurity Cometene Center 1, av. Belle Fontaine Cesson Sévigné, Frane teddy.furon@thomson.net

More information

On Some Coefficient Estimates For Certain Subclass of Analytic And Multivalent Functions

On Some Coefficient Estimates For Certain Subclass of Analytic And Multivalent Functions IOSR Journal of Mathematis (IOSR-JM) e-issn: 78-578, -ISSN: 9-765X. Volume, Issue 6 Ver. I (Nov. - De.06), PP 58-65 www.iosrournals.org On Some Coeffiient Estimates For Certain Sublass of nalyti nd Multivalent

More information

A Queueing Model for Call Blending in Call Centers

A Queueing Model for Call Blending in Call Centers A Queueing Model for Call Blending in Call Centers Sandjai Bhulai and Ger Koole Vrije Universiteit Amsterdam Faulty of Sienes De Boelelaan 1081a 1081 HV Amsterdam The Netherlands E-mail: {sbhulai, koole}@s.vu.nl

More information

Methods of evaluating tests

Methods of evaluating tests Methods of evaluating tests Let X,, 1 Xn be i.i.d. Bernoulli( p ). Then 5 j= 1 j ( 5, ) T = X Binomial p. We test 1 H : p vs. 1 1 H : p>. We saw that a LRT is 1 if t k* φ ( x ) =. otherwise (t is the observed

More information

Application of Drucker-Prager Plasticity Model for Stress- Strain Modeling of FRP Confined Concrete Columns

Application of Drucker-Prager Plasticity Model for Stress- Strain Modeling of FRP Confined Concrete Columns Available online at www.sienediret.om Proedia Engineering 14 (211) 687 694 The Twelfth East Asia-Paifi Conferene on Strutural Engineering and Constrution Aliation of Druker-Prager Plastiity Model for Stress-

More information

Any AND-OR formula of size N can be evaluated in time N 1/2+o(1) on a quantum computer

Any AND-OR formula of size N can be evaluated in time N 1/2+o(1) on a quantum computer Any AND-OR formula of size N an be evaluated in time N /2+o( on a quantum omuter Andris Ambainis, ambainis@math.uwaterloo.a Robert Šalek salek@ees.berkeley.edu Andrew M. Childs, amhilds@uwaterloo.a Shengyu

More information

A General Approach for Analysis of Actuator Delay Compensation Methods for Real-Time Testing

A General Approach for Analysis of Actuator Delay Compensation Methods for Real-Time Testing The th World Conferene on Earthquake Engineering Otober -7, 8, Beijing, China A General Aroah for Analysis of Atuator Delay Comensation Methods for Real-Time Testing Cheng Chen and James M. Riles Post-dotoral

More information

INTUITIONISTIC FUZZY SOFT MATRIX THEORY IN MEDICAL DIAGNOSIS USING MAX-MIN AVERAGE COMPOSITION METHOD

INTUITIONISTIC FUZZY SOFT MATRIX THEORY IN MEDICAL DIAGNOSIS USING MAX-MIN AVERAGE COMPOSITION METHOD Journal of Theoretial and lied Information Tehnology 10 th Setember 014. Vol. 67 No.1 005-014 JTIT & LLS. ll rights reserved. ISSN: 199-8645 www.atit.org E-ISSN: 1817-3195 INTUITIONISTI FUZZY SOFT MTRIX

More information

Solutions of burnt-bridge models for molecular motor transport

Solutions of burnt-bridge models for molecular motor transport PHYSICAL REVIEW E 75, 0390 2007 Solutions of burnt-bridge models for moleular motor transort Alexander Yu. Morozov, Ekaterina Pronina, and Anatoly B. Kolomeisky Deartment of Chemistry, Rie University,

More information

Fundamental Theorem of Calculus

Fundamental Theorem of Calculus Chater 6 Fundamental Theorem of Calulus 6. Definition (Nie funtions.) I will say that a real valued funtion f defined on an interval [a, b] is a nie funtion on [a, b], if f is ontinuous on [a, b] and integrable

More information

Maintaining prediction quality under the condition of a growing knowledge space. A probabilistic dynamic model of knowledge spaces quality evolution

Maintaining prediction quality under the condition of a growing knowledge space. A probabilistic dynamic model of knowledge spaces quality evolution Maintaining redition quality under the ondition of a growing knowledge sae Christoh Jahnz ORCID: 0000-0003-3297-7564 Email: jahnz@gmx.de Intelligene an be understood as an agent s ability to redit its

More information

USING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS

USING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS Journal of Engineering Studies and Researh Volume 19 (2013) No. 1 47 USING GENETIC ALGORITHMS FOR OPTIMIZATION OF TURNING MACHINING PROCESS DUSAN PETKOVIC 1, MIROSLAV RADOVANOVIC 1 1 University of Nis,

More information

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion Millennium Relativity Aeleration Composition he Relativisti Relationship between Aeleration and niform Motion Copyright 003 Joseph A. Rybzyk Abstrat he relativisti priniples developed throughout the six

More information

arxiv: v1 [cs.gt] 21 Jul 2017

arxiv: v1 [cs.gt] 21 Jul 2017 RSU II ommuniation Cyber attaks Otimal Seure Multi-Layer IoT Network Design Juntao Chen, Corinne Touati, and Quanyan Zhu arxiv:1707.07046v1 [s.gt] 1 Jul 017 Abstrat With the remarkable growth of the Internet

More information

Danielle Maddix AA238 Final Project December 9, 2016

Danielle Maddix AA238 Final Project December 9, 2016 Struture and Parameter Learning in Bayesian Networks with Appliations to Prediting Breast Caner Tumor Malignany in a Lower Dimension Feature Spae Danielle Maddix AA238 Final Projet Deember 9, 2016 Abstrat

More information

INFORMATION TRANSFER THROUGH CLASSIFIERS AND ITS RELATION TO PROBABILITY OF ERROR

INFORMATION TRANSFER THROUGH CLASSIFIERS AND ITS RELATION TO PROBABILITY OF ERROR IFORMATIO TRAFER TROUG CLAIFIER AD IT RELATIO TO PROBABILITY OF ERROR Deniz Erdogmus, Jose C. Prini Comutational euroengineering Lab (CEL, University of Florida, Gainesville, FL 6 [deniz,rini]@nel.ufl.edu

More information

Subject: Modeling of Thermal Rocket Engines; Nozzle flow; Control of mass flow. p c. Thrust Chamber mixing and combustion

Subject: Modeling of Thermal Rocket Engines; Nozzle flow; Control of mass flow. p c. Thrust Chamber mixing and combustion 16.50 Leture 6 Subjet: Modeling of Thermal Roket Engines; Nozzle flow; Control of mass flow Though onetually simle, a roket engine is in fat hysially a very omlex devie and diffiult to reresent quantitatively

More information

arxiv: v2 [cs.ai] 16 Feb 2016

arxiv: v2 [cs.ai] 16 Feb 2016 Cells in Multidimensional Reurrent Neural Networks arxiv:42.2620v2 [s.ai] 6 Feb 206 Gundram Leifert Tobias Sauß Tobias Grüning Welf Wustlih Roger Labahn University of Rostok Institute of Mathematis 805

More information

Maximum Entropy and Exponential Families

Maximum Entropy and Exponential Families Maximum Entropy and Exponential Families April 9, 209 Abstrat The goal of this note is to derive the exponential form of probability distribution from more basi onsiderations, in partiular Entropy. It

More information

NONLINEAR MODEL: CELL FORMATION

NONLINEAR MODEL: CELL FORMATION ational Institute of Tehnology Caliut Deartment of Mehanial Engineering OLIEAR MODEL: CELL FORMATIO System Requirements and Symbols 0-Integer Programming Formulation α i - umber of interell transfers due

More information

INCOME AND SUBSTITUTION EFFECTS

INCOME AND SUBSTITUTION EFFECTS 3076-C5.df 3/12/04 10:56 AM Page 121 121 Chater 5 INCOME AND SUBSTITUTION EFFECTS In this hater we will use the utility-maimization model to study how the quantity of a good that an individual hooses is

More information

A NONLILEAR CONTROLLER FOR SHIP AUTOPILOTS

A NONLILEAR CONTROLLER FOR SHIP AUTOPILOTS Vietnam Journal of Mehanis, VAST, Vol. 4, No. (), pp. A NONLILEAR CONTROLLER FOR SHIP AUTOPILOTS Le Thanh Tung Hanoi University of Siene and Tehnology, Vietnam Abstrat. Conventional ship autopilots are

More information

TOWARD THE DEVELOPMENT OF A METHODOLOGY FOR DESIGNING HELICOPTER FLIGHT CONTROL LAWS BY INTEGRATING HANDLING QUALITIES REQUIREMENTS

TOWARD THE DEVELOPMENT OF A METHODOLOGY FOR DESIGNING HELICOPTER FLIGHT CONTROL LAWS BY INTEGRATING HANDLING QUALITIES REQUIREMENTS TOWARD THE DEVELOPMENT OF A METHODOLOGY FOR DESIGNING HELICOPTER FLIGHT CONTROL LAWS BY INTEGRATING HANDLING QUALITIES REQUIREMENTS Jean-Charles Antonioli ONERA Air Base 70 366 Salon de Provene Frane Armin

More information

Complexity of Regularization RBF Networks

Complexity of Regularization RBF Networks Complexity of Regularization RBF Networks Mark A Kon Department of Mathematis and Statistis Boston University Boston, MA 02215 mkon@buedu Leszek Plaskota Institute of Applied Mathematis University of Warsaw

More information

Simple Cyclic loading model based on Modified Cam Clay

Simple Cyclic loading model based on Modified Cam Clay Simle li loading model based on Modified am la Imlemented in RISP main rogram version 00. and higher B Amir Rahim, The RISP onsortium Ltd Introdution This reort resents a simle soil model whih rovides

More information

Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model

Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model Real-time Hand Traking Using a Sum of Anisotroi Gaussians Model Srinath Sridhar 1, Helge Rhodin 1, Hans-Peter Seidel 1, Antti Oulasvirta 2, Christian Theobalt 1 1 Max Plank Institute for Informatis Saarbrüken,

More information

The Effectiveness of the Linear Hull Effect

The Effectiveness of the Linear Hull Effect The Effetiveness of the Linear Hull Effet S. Murphy Tehnial Report RHUL MA 009 9 6 Otober 009 Department of Mathematis Royal Holloway, University of London Egham, Surrey TW0 0EX, England http://www.rhul.a.uk/mathematis/tehreports

More information

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 CMSC 451: Leture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 Reading: Chapt 11 of KT and Set 54 of DPV Set Cover: An important lass of optimization problems involves overing a ertain domain,

More information

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM NETWORK SIMPLEX LGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM Cen Çalışan, Utah Valley University, 800 W. University Parway, Orem, UT 84058, 801-863-6487, en.alisan@uvu.edu BSTRCT The minimum

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson & J. Fischer) January 21,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson & J. Fischer) January 21, Sequene Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson & J. Fisher) January 21, 201511 9 Suffix Trees and Suffix Arrays This leture is based on the following soures, whih are all reommended

More information

The Constrainedness of Search. constrainedness. Surprisingly, we can often characterize. an ensemble of problems using just two parameters:

The Constrainedness of Search. constrainedness. Surprisingly, we can often characterize. an ensemble of problems using just two parameters: In Proeedings of AAAI-96. Amerian Assoiation for Artiial Intelligene, 1996. The Constrainedness of Searh Ian P. Gent and Ewan MaIntyre and Patrik Prosser Deartment of Comuter Siene University of Strathlyde

More information

Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Leture 5 for BST 63: Statistial Theory II Kui Zhang, Spring Chapter 8 Hypothesis Testing Setion 8 Introdution Definition 8 A hypothesis is a statement about a population parameter Definition 8 The two

More information

Lightpath routing for maximum reliability in optical mesh networks

Lightpath routing for maximum reliability in optical mesh networks Vol. 7, No. 5 / May 2008 / JOURNAL OF OPTICAL NETWORKING 449 Lightpath routing for maximum reliability in optial mesh networks Shengli Yuan, 1, * Saket Varma, 2 and Jason P. Jue 2 1 Department of Computer

More information

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS Fratals, Vol. 9, No. (00) 09 World Sientifi Publishing Company UPPER-TRUNCATED POWER LAW DISTRIBUTIONS STEPHEN M. BURROUGHS and SARAH F. TEBBENS College of Marine Siene, University of South Florida, St.

More information

, given by. , I y. and I z. , are self adjoint, meaning that the adjoint of the operator is equal to the operator. This follows as A.

, given by. , I y. and I z. , are self adjoint, meaning that the adjoint of the operator is equal to the operator. This follows as A. Further relaxation 6. ntrodution As resented so far the theory is aable of rediting the rate of transitions between energy levels i.e. it is onerned with oulations. The theory is thus erfetly aetable for

More information

18.05 Problem Set 6, Spring 2014 Solutions

18.05 Problem Set 6, Spring 2014 Solutions 8.5 Problem Set 6, Spring 4 Solutions Problem. pts.) a) Throughout this problem we will let x be the data of 4 heads out of 5 tosses. We have 4/5 =.56. Computing the likelihoods: 5 5 px H )=.5) 5 px H

More information

silicon wafers p b θ b IR heater θ h p h solution bath cleaning filter bellows pump

silicon wafers p b θ b IR heater θ h p h solution bath cleaning filter bellows pump An Analysis of Cometitive Assoiative Net for Temerature Control of RCA Cleaning Solutions S Kurogi y, H Nobutomo y, T Nishida y, H Sakamoto y, Y Fuhikawa y, M Mimata z and K Itoh z ykyushu Institute of

More information

7 Max-Flow Problems. Business Computing and Operations Research 608

7 Max-Flow Problems. Business Computing and Operations Research 608 7 Max-Flow Problems Business Computing and Operations Researh 68 7. Max-Flow Problems In what follows, we onsider a somewhat modified problem onstellation Instead of osts of transmission, vetor now indiates

More information

Emergence of Superpeer Networks: A New Perspective

Emergence of Superpeer Networks: A New Perspective Emergene o Suereer Networs: A New Persetive Bivas Mitra Deartment o Comuter Siene & Engineering Indian Institute o Tehnology, Kharagur, India Peer to Peer arhiteture Node Node Node Internet Node Node All

More information

The Spread of Internet Worms and the Optimal Patch Release Strategies

The Spread of Internet Worms and the Optimal Patch Release Strategies Assoiation for Information Systems AIS Eletroni Library (AISeL) AMCIS 5 Proeedings Amerias Conferene on Information Systems (AMCIS) 5 The Sread of Internet Worms and the Otimal Path Release Strategies

More information

Tight bounds for selfish and greedy load balancing

Tight bounds for selfish and greedy load balancing Tight bounds for selfish and greedy load balaning Ioannis Caragiannis Mihele Flammini Christos Kaklamanis Panagiotis Kanellopoulos Lua Mosardelli Deember, 009 Abstrat We study the load balaning problem

More information

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach Amerian Journal of heoretial and Applied tatistis 6; 5(-): -8 Published online January 7, 6 (http://www.sienepublishinggroup.om/j/ajtas) doi:.648/j.ajtas.s.65.4 IN: 36-8999 (Print); IN: 36-96 (Online)

More information

Experimental study with analytical validation of thermo-hydraulic performance of plate heat exchanger with different chevron angles

Experimental study with analytical validation of thermo-hydraulic performance of plate heat exchanger with different chevron angles Exerimental study with analytial validation of thermo-hydrauli erformane of late heat exhanger with different hevron angles Arunahala U C, Ajay R K and Sandee H M Abstrat The Plate heat exhanger is rominent

More information

SYNTHESIS AND OPTIMIZATION OF EPICYCLIC-TYPE AUTOMATIC TRANSMISSIONS BASED ON NOMOGRAPHS

SYNTHESIS AND OPTIMIZATION OF EPICYCLIC-TYPE AUTOMATIC TRANSMISSIONS BASED ON NOMOGRAPHS Al-Qadisiya Journal For Engineering Sienes Vol. 4 o. 3 Year 0 SYTHESIS AD OPTIMIZATIO OF EPICYCLIC-TYPE AUTOMATIC TASMISSIOS BASED O OMOGAPHS E. L. Esmail K. H. Salih Leturer Assistant Leturer Deartment

More information

Selection of 'optimal' poles for SISO pole placement design: SRL LQR design example Prediction and current estimators

Selection of 'optimal' poles for SISO pole placement design: SRL LQR design example Prediction and current estimators L5: Leture 5 Symmetri Root Lous LQR Design State Estimation Seletion of 'otimal' oles for SISO ole laement design: SRL LQR design examle Predition and urrent estimators L5:2 Otimal ole laement for SISO

More information

15.12 Applications of Suffix Trees

15.12 Applications of Suffix Trees 248 Algorithms in Bioinformatis II, SoSe 07, ZBIT, D. Huson, May 14, 2007 15.12 Appliations of Suffix Trees 1. Searhing for exat patterns 2. Minimal unique substrings 3. Maximum unique mathes 4. Maximum

More information

Symmetric Root Locus. LQR Design

Symmetric Root Locus. LQR Design Leture 5 Symmetri Root Lous LQR Design State Estimation Seletion of 'otimal' oles for SISO ole laement design: SRL LQR design examle Predition and urrent estimators L5:1 L5:2 Otimal ole laement for SISO

More information

A Spatiotemporal Approach to Passive Sound Source Localization

A Spatiotemporal Approach to Passive Sound Source Localization A Spatiotemporal Approah Passive Sound Soure Loalization Pasi Pertilä, Mikko Parviainen, Teemu Korhonen and Ari Visa Institute of Signal Proessing Tampere University of Tehnology, P.O.Box 553, FIN-330,

More information

The Effect of Gauge Measurement Capability and Dependency Measure of Process Variables on the MC p

The Effect of Gauge Measurement Capability and Dependency Measure of Process Variables on the MC p Journal of Industrial and Systems Engineering Vol. 4, No., 59-76 Sring The Effet of Gauge easurement Caability and Deendeny easure of Proess Variables on the C Daood Shishebori, Ali Zeinal Hamadani Deartment

More information

General Equilibrium. What happens to cause a reaction to come to equilibrium?

General Equilibrium. What happens to cause a reaction to come to equilibrium? General Equilibrium Chemial Equilibrium Most hemial reations that are enountered are reversible. In other words, they go fairly easily in either the forward or reverse diretions. The thing to remember

More information

State Diagrams. Margaret M. Fleck. 14 November 2011

State Diagrams. Margaret M. Fleck. 14 November 2011 State Diagrams Margaret M. Flek 14 November 2011 These notes over state diagrams. 1 Introdution State diagrams are a type of direted graph, in whih the graph nodes represent states and labels on the graph

More information

WEIGHTED MAXIMUM OVER MINIMUM MODULUS OF POLYNOMIALS, APPLIED TO RAY SEQUENCES OF PADÉ APPROXIMANTS

WEIGHTED MAXIMUM OVER MINIMUM MODULUS OF POLYNOMIALS, APPLIED TO RAY SEQUENCES OF PADÉ APPROXIMANTS WEIGHTED MAXIMUM OVER MINIMUM MODULUS OF POLYNOMIALS, APPLIED TO RAY SEQUENCES OF PADÉ APPROXIMANTS D.S. LUBINSKY Abstrat. Let a 0; " > 0. We use otential theory to obtain a shar lower bound for the linear

More information

Investigating the Performance of a Hydraulic Power Take-Off

Investigating the Performance of a Hydraulic Power Take-Off Investigating the Performane of a Hydrauli Power Take-Off A.R. Plummer and M. Shlotter Centre for Power Transmission and Control, University of Bath, Bath, BA 7AY, UK E-mail: A.R.Plummer@bath.a.uk E-mail:

More information

Average Rate Speed Scaling

Average Rate Speed Scaling Average Rate Speed Saling Nikhil Bansal David P. Bunde Ho-Leung Chan Kirk Pruhs May 2, 2008 Abstrat Speed saling is a power management tehnique that involves dynamially hanging the speed of a proessor.

More information

Takeshi Kurata Jun Fujiki Katsuhiko Sakaue. 1{1{4 Umezono, Tsukuba-shi, Ibaraki , JAPAN. fkurata, fujiki,

Takeshi Kurata Jun Fujiki Katsuhiko Sakaue. 1{1{4 Umezono, Tsukuba-shi, Ibaraki , JAPAN. fkurata, fujiki, Ane Eiolar Geometry via Fatorization Method akeshi Kurata Jun Fujiki Katsuhiko Sakaue Eletrotehnial Laboratory {{4 Umezono, sukuba-shi, Ibaraki 305-8568, JAPAN fkurata, fujiki, sakaueg@etl.go.j Abstrat

More information

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont.

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont. Stat60/CS94: Randomized Algorithms for Matries and Data Leture 7-09/5/013 Leture 7: Sampling/Projetions for Least-squares Approximation, Cont. Leturer: Mihael Mahoney Sribe: Mihael Mahoney Warning: these

More information

LOGISTIC REGRESSION IN DEPRESSION CLASSIFICATION

LOGISTIC REGRESSION IN DEPRESSION CLASSIFICATION LOGISIC REGRESSIO I DEPRESSIO CLASSIFICAIO J. Kual,. V. ran, M. Bareš KSE, FJFI, CVU v Praze PCP, CS, 3LF UK v Praze Abstrat Well nown logisti regression and the other binary response models an be used

More information

Machine Learning Applied to Alliance Networks

Machine Learning Applied to Alliance Networks Mahine Learning Alied to Alliane Networks Telmo Menezes (telmo@telmomenezes.om) CAMS-EHESS / CNRS Klaus Hamberger LAS / EHESS / CNRS Camille Roth CAMS-EHESS / ISC-PIF / CNRS The Problem How to disover

More information

Case I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1

Case I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1 MUTLIUSER DETECTION (Letures 9 and 0) 6:33:546 Wireless Communiations Tehnologies Instrutor: Dr. Narayan Mandayam Summary By Shweta Shrivastava (shwetash@winlab.rutgers.edu) bstrat This artile ontinues

More information

Assessing the Performance of a BCI: A Task-Oriented Approach

Assessing the Performance of a BCI: A Task-Oriented Approach Assessing the Performane of a BCI: A Task-Oriented Approah B. Dal Seno, L. Mainardi 2, M. Matteui Department of Eletronis and Information, IIT-Unit, Politenio di Milano, Italy 2 Department of Bioengineering,

More information

The Hyperbolic Region for Restricted Isometry Constants in Compressed Sensing

The Hyperbolic Region for Restricted Isometry Constants in Compressed Sensing INTERNATIONAL JOURNAL OF CIRCUITS SYSTEMS AND SIGNAL PROCESSING Volume 8 Te Hyeroli Region for Restrited Isometry Constants in Comressed Sensing Siqing Wang Yan Si and Limin Su Astrat Te restrited isometry

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Research on Components Characteristics of Single Container Friction Hoist

Research on Components Characteristics of Single Container Friction Hoist 6 3 rd International Conferene on Mehanial, Industrial, and Manufaturing Engineering (MIME 6) ISB: 978--6595-33-7 Researh on Comonents Charateristis of Single Container Frition oist irong Xie, Jing Cheng,

More information

Theory of the Integer and Fractional Quantum Hall Effects

Theory of the Integer and Fractional Quantum Hall Effects Theory of the Integer and Frational Quantum all Effets Shosuke SASAKI Center for Advaned igh Magneti Field Siene, Graduate Shool of Siene, Osaka University, - Mahikaneyama, Toyonaka, Osaka 56-, Jaan Prefae

More information

Flow Characteristics of High-Pressure Hydrogen Gas in the Critical Nozzle

Flow Characteristics of High-Pressure Hydrogen Gas in the Critical Nozzle Flow Charateristis of High-Pressure Hydrogen Gas in the Critial Nozzle Shigeru MATSUO *1, Soihiro KOYAMA *2, Junji NAGAO *3 and Toshiaki SETOGUCHI *4 *1 Saga University, Det. of Mehanial Engineering, 1

More information

A Characterization of Wavelet Convergence in Sobolev Spaces

A Characterization of Wavelet Convergence in Sobolev Spaces A Charaterization of Wavelet Convergene in Sobolev Spaes Mark A. Kon 1 oston University Louise Arakelian Raphael Howard University Dediated to Prof. Robert Carroll on the oasion of his 70th birthday. Abstrat

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Singular Event Detection

Singular Event Detection Singular Event Detetion Rafael S. Garía Eletrial Engineering University of Puerto Rio at Mayagüez Rafael.Garia@ee.uprm.edu Faulty Mentor: S. Shankar Sastry Researh Supervisor: Jonathan Sprinkle Graduate

More information

Subject: Introduction to Component Matching and Off-Design Operation % % ( (1) R T % (

Subject: Introduction to Component Matching and Off-Design Operation % % ( (1) R T % ( 16.50 Leture 0 Subjet: Introdution to Component Mathing and Off-Design Operation At this point it is well to reflet on whih of the many parameters we have introdued (like M, τ, τ t, ϑ t, f, et.) are free

More information

max min z i i=1 x j k s.t. j=1 x j j:i T j

max min z i i=1 x j k s.t. j=1 x j j:i T j AM 221: Advaned Optimization Spring 2016 Prof. Yaron Singer Leture 22 April 18th 1 Overview In this leture, we will study the pipage rounding tehnique whih is a deterministi rounding proedure that an be

More information

Reliability Guaranteed Energy-Aware Frame-Based Task Set Execution Strategy for Hard Real-Time Systems

Reliability Guaranteed Energy-Aware Frame-Based Task Set Execution Strategy for Hard Real-Time Systems Reliability Guaranteed Energy-Aware Frame-Based ask Set Exeution Strategy for Hard Real-ime Systems Zheng Li a, Li Wang a, Shuhui Li a, Shangping Ren a, Gang Quan b a Illinois Institute of ehnology, Chiago,

More information

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013 Ultrafast Pulses and GVD John O Hara Created: De. 6, 3 Introdution This doument overs the basi onepts of group veloity dispersion (GVD) and ultrafast pulse propagation in an optial fiber. Neessarily, it

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

3. THE SOLUTION OF TRANSFORMATION PARAMETERS

3. THE SOLUTION OF TRANSFORMATION PARAMETERS Deartment of Geosatial Siene. HE SOLUION OF RANSFORMAION PARAMEERS Coordinate transformations, as used in ratie, are models desribing the assumed mathematial relationshis between oints in two retangular

More information

Sensitivity Analysis in Markov Networks

Sensitivity Analysis in Markov Networks Sensitivity Analysis in Markov Networks Hei Chan and Adnan Darwihe Computer Siene Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwihe}@s.ula.edu Abstrat This paper explores

More information

Collocations. (M&S Ch 5)

Collocations. (M&S Ch 5) Colloations M&S Ch 5 Introdution Colloations are haraterized by limited ompositionality. Large overlap between the onepts of olloations and terms tehnial term and terminologial phrase. Colloations sometimes

More information

Smoldering Combustion of Horizontally Oriented Polyurethane Foam with Controlled Air Supply

Smoldering Combustion of Horizontally Oriented Polyurethane Foam with Controlled Air Supply Smoldering Combustion of Horizontally Oriented Polyurethane Foam with Controlled Air Suly LEI PENG, CHANG LU, JIANJUN ZHOU, LINHE ZHANG, and FEI YOU State Key Laboratory of Fire Siene, USTC Hefei Anhui,

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Nonreversibility of Multiple Unicast Networks

Nonreversibility of Multiple Unicast Networks Nonreversibility of Multiple Uniast Networks Randall Dougherty and Kenneth Zeger September 27, 2005 Abstrat We prove that for any finite direted ayli network, there exists a orresponding multiple uniast

More information

Relativistic Dynamics

Relativistic Dynamics Chapter 7 Relativisti Dynamis 7.1 General Priniples of Dynamis 7.2 Relativisti Ation As stated in Setion A.2, all of dynamis is derived from the priniple of least ation. Thus it is our hore to find a suitable

More information

NUMERICAL SIMULATIONS OF METAL ONTO COMPOSITES CASTING

NUMERICAL SIMULATIONS OF METAL ONTO COMPOSITES CASTING NUMERICAL SIMULATIONS OF METAL ONTO COMPOSITES CASTING E. Marklund 1, D. Mattsson 2, M. Svanberg 2 and M. Wysoki 1* 1 Swerea SICOMP AB, Box 14, SE-431 22 Mölndal, Sweden 2 Swerea SICOMP AB, Box 271, SE-941

More information

On a Networked Automata Spacetime and its Application to the Dynamics of States in Uniform Translatory Motion

On a Networked Automata Spacetime and its Application to the Dynamics of States in Uniform Translatory Motion On a Networked Automata Saetime and its Aliation to the Dynamis of States in Uniform Translatory Motion Othon Mihail Deartment of Comuter Siene, University of Liverool, Liverool, UK Othon.Mihail@liverool.a.uk

More information

Taste for variety and optimum product diversity in an open economy

Taste for variety and optimum product diversity in an open economy Taste for variety and optimum produt diversity in an open eonomy Javier Coto-Martínez City University Paul Levine University of Surrey Otober 0, 005 María D.C. Garía-Alonso University of Kent Abstrat We

More information

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 16 Aug 2004

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 16 Aug 2004 Computational omplexity and fundamental limitations to fermioni quantum Monte Carlo simulations arxiv:ond-mat/0408370v1 [ond-mat.stat-meh] 16 Aug 2004 Matthias Troyer, 1 Uwe-Jens Wiese 2 1 Theoretishe

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Planar Undulator Considerations

Planar Undulator Considerations LCC-0085 July 2002 Linear Collider Collaboration Teh Notes Planar Undulator Considerations John C. Sheard Stanford Linear Aelerator Center Stanford University Menlo Park, California Abstrat: This note

More information