Data Discovery and Anomaly Detection Using Atypicality: Theory

Size: px
Start display at page:

Download "Data Discovery and Anomaly Detection Using Atypicality: Theory"

Transcription

1 Data Discovery and Anomay Detection Using Atypicaity: Theory Anders Høst-Madsen, Feow, IEEE, Eyas Sabeti, Member, IEEE, Chad Waton Abstract A centra question in the era of big data is what to do with the enormous amount of information. One possibiity is to characterize it through statistics, e.g., averages, or cassify it using machine earning, in order to understand the genera structure of the overa data. The perspective in this paper is the opposite, namey that most of the vaue in the information in some appications is in the parts that deviate from the average, that are unusua, atypica. We define what we mean by atypica in an axiomatic way as data that can be encoded with fewer bits in itsef rather than using the code for the typica data. We show that this definition has good theoretica properties. We then deveop an impementation based on universa source coding, and appy this to a number of rea word data sets. Index Terms Big Data, atypicaity, minimum description ength, data discovery, anomay. I. INTRODUCTION One characteristic of the information age is the exponentia growth of information, and the ready avaiabiity of this information through networks, incuding the internet Big Data. The question is what to do with this enormous amount of information. One possibiity is to characterize it through statistics think averages. The perspective in this paper is the opposite, namey that most of the vaue in the information is in the parts that deviate from the average, that are unusua, atypica. The rest is just background noise. Take art: the truy vauabe paintings are those that are rare and atypica. The same coud be true for scientific research and entrepreneurship. Take onine coections of photos, such as Fickr.com. Most of the photos are rather pedestrian snapshots and not of interest to a wider audience. The photos that of interest are those that are unique. Fickr has a coection of photos rated for interestingness, and one can notice that those photos are indeed very different from typica photos. They are atypica. The aim of our approach is to extract such rare interesting data out of big data sets. The centra question is what interesting means. A first thought is to focus on the rare part. That is, interesting data is something that A. Høst-Madsen and E. Sabeti are with the Department of Eectrica Engineering, University of Hawaii Manoa, Honouu, HI 968 e-mai: {ahm,sabeti}@hawaii.edu. C. Waton is with the Department of Surgery, University of Hawaii, Honouu, HI, Emai: cwaton@hawaii.edu. This work was supported in part by NSF grants CCF 0783, 07775, and The paper was presented in part at IEEE Information Theory Workshop 03, Sevie.

2 is unikey based on prior knowedge of typica data or exampes of typica data, i.e., training. This is the way an outier is usuay defined. Unikeiness coud be measured in terms of ikeihood, in terms of codeength [], [] caed surprise in [3] or according to some distance measure. This is aso the most common principe in anomay detection [4]. However, perhaps being unikey is not sufficient for something to be interesting. In many cases, outiers are junk that are eiminated not to contaminate the typica data. What makes something interesting is maybe that it has a new unusua structure in itsef that is quite different from the structure of the data we have aready seen. Return to the exampe of paintings: what make masterworks interesting is not just that they are different than other paintings, but that they have some structure that is intriguing. Or take another exampe. Many scientific discoveries, ike the theory of reativity and quantum mechanics, began with experiments that did not fit with prevaiing theories. The experiments were outiers or anomaies. What made them truy interesting was that it was possibe to find a new theory to expain the data, be it reativity or quantum mechanics. This is the principe we pursue: finding data that have better aternative expanations than those that fit the typica data. Something being unikey is not even necessary for the data to be interesting. Suppose the typica data is iid uniform {0, }. Then any sequence of bits are equay ikey. Therefore, a sequence consisting of purey,... is in no way surprising. Yet, it shoud catch our interest. When we ook for new interesting data, a characteristic is that we do not know what we are ooking for. We are ooking for unknown unknowns [5]. Instead of ooking at specific statistics of data, we need to use a universa approach. This is provided by information theory. This idea of finding aternative expanations for data rather than measuring some kind of difference from typica data is what separates our method from usua approaches in outier detection and anomay detection. As far as we can determine from reading hundreds of papers, our approach has not been expored previousy. Obviousy, information theory and coding have been used in anomay detection, data mining, and knowedge discovery before, and we wi discuss how this compares to our approach ater. Our methodoogy aso has connections to tests for randomness, e.g., the run ength test and [6], [7], but our aim is different. A. Appications Atypicaity is reevant in arge number of various appications. We wi ist a few appications here. ECG. For eectrocardiogram ECG recordings there are patterns in heart rate variabiity that are known to indicate possibe heart disease [8], [9], [0], []. With modern technoogy it is possibe for an individua to wear and unobtrusive heart rate monitor 4/7. If atypica patterns occur, it coud be indicative of disease, and the individua or a doctor coud be notified. But perhaps a more important appication is to medica research. One can anayze a arge coection of ECG recordings and ook for individuas with atypica patterns. This can then potentiay be used to deveop new diagnostic toos. Genomics. Another exampe of appication is interpretation of arge coections of genomics data. Given that a mammas have essentiay the same set of genes, there must exist some significant differences that distinguish the obvious distinct attributes between species, as we as more subte differences within a species. Athough the

3 3 genome has been mined by exhaustive studies appying a panopy of approaches, regions once thought to be uninteresting have recenty come under increased study for their potentia roe in defined morphoogica and physioogica differences between individuas []. Appying an atypica evauation too to genomic data from individuas of known pathophysioogica/morphoogica irreguarities may provide vauabe insight to the genetic mechanisms underying the condition. Ocean Monitoring. In passive acoustic monitoring PAM [3] of oceans, one or more hydrophones is towed behind a ship or depoyed in a fixed bottom-mounted or suspended array in order to record vocaizations of marine mammas. One major focus is to detect, and perhaps count, rare or endangered species. It woud be highy interesting to scan the data for any unusua patterns, which can then be further examined by a researcher. Pant Monitoring. In for exampe nucear pants, atypica monitoring data may be indicative of something about to go wrong. Computer Networks. Atypica network traffic coud be indicative of a cyberattack. This is aready being used through anomay detection [4]. However, an abstract atypicaity approach can be used to find more subte attacks the unknown unknowns. Airport Security. Aready software is being used to fag suspicious fyers, ikey based on past attacks. Atypica detection coud be used to find innovative attackers. Stock Market. Atypicaity coud be used to detect insider trading. It coud aso be used by investors to find unusua stocks to invest in, promising outstanding returns or ruin. Astronomy. Atypicaity can be used to scan huge databases for new kinds of cosmoogica phenomena. Credit Card Fraud. Unusua spending patterns coud be indicative of fraud. This is aready used by credit card companies, but obviousy in a simpe, and annoying way, as anyone who s credit card has been bocked on an overseas trip can testify to. Gambing. Casinos are constanty fighting fraudsters. This is a game of cat and mouse. Fraudsters constanty find new ways to trick the casinos one such inventor was Shannon himsef. Therefore, an abstract atypicaity approach may be the best soution to catch new ways of fraud. B. Notation We use x to denote a sequence in genera, and x when we need to make the ength expicit; x i denotes a singe sampe of the sequence. We use capita etters X i to denote random variabes rather than specific outcomes. Finay X denotes a subsequence. A ogarithms are to base uness otherwise indicated. II. ATYPICALITY Our starting point is the in theory of randomness deveoped by Komogorov and Martin-Löf [5], [7], [6]. Komogorov divides infinite sequences into typica and specia. The typica sequences are those that we can ca random, that is, they satisfy a aws of probabiity. They can be characterized through Komogorov compexity. A sequence of bits {x n, n =,..., } is random i.e, iid uniform if the Komogorov compexity of

4 4 the sequence satisfies Kx,..., x n n c for some constant c and for a n[5]. The sequence is incompressibe if Kx,..., x n n n for a n, and a finite sequence is agorithmicay random if Kx,..., x n n n [6]. In terms of coding, an iid random sequence is aso incompressibe, or, put another way, the best coder is the identity function. Let us assume we draw sequences x n from an iid uniform distribution. The optimum coder is the identity function, and the code ength is n. Now suppose that for one of these sequences we can find a universa coder so that the code ength is ess than n; whie not directy equivaent, one coud state this as Kx,..., x n n < n. With an interpretation of Komogorov s terms, this woud not be a typica sequence, but a specia sequence. We wi instead ca such sequences atypica. Considering genera distributions and genera finite aphabets instead of iid uniform distributions, we can state this in the foowing genera principe Definition. A sequence is atypica if it can be described coded with fewer bits in itsef rather than using the optimum code for typica sequences. This definition is centra to our approach to the atypicaity probem. In the definition, the optimum code for typica sequences, is quite specific, foowing the principes in for exampe [6]. We assume prefix free codes. Within that cass the coding coud be done using Huffman codes, Shannon codes, Shannon-Fano-Eias codes, arithmetic coding etc. We care ony about the code ength, and among these the variation in ength is within a few bits, so that the code ength for typica encoding can be quite accuratey cacuated. On the other hand, described coded with fewer bits in itsef is ess precise. In principe one coud use Komogorov compexity, but Komogorov compexity is not cacuabe and it is ony given except for a constant, and comparison with code ength therefore is not an appes-to-appes comparison. Rather, some type of universa source coder shoud be used. This can be given a quite precise meaning in the cass of finite state machine sources, [7] and foowing work, and is strongy reated to minimum description ength MDL [8], [9], [0], [7]. What is essentia is that we adhere to strict decodabiity at the decoder. The decoder ony sees a stream of bits, and from this it shoud be abe to accuratey reconstruct the source sequence. So, for exampe, if a sequence is atypica, there must be a type of header teing the decoder to use a universa decoder rather than the typica decoder. Or, if atypica sequences can be encoded in mutipe ways, the decoder must be informed through the sequence of bits which encoder was used. One coud argue that such things are irreevant for for exampe anomay detection, since we are not actuay encoding sequences. The probem is that if such terms are omitted, it is far too easy to encode a sequence in itsef. This is ike choosing a more compex mode to fit data, without accounting for the mode compexity in itsef, which is exacty what MDL sets out to sove, athough aso in this case actua encoding is not done. We therefore try to account for a factors needed to describe data, and we beieve this is one of the key strengths of the approach. A major difference between atypica data and anomaous data is that atypicaity is an axiomatic property of data, defined by Definition based on Komogorov-Martin-Löf randomness. On the other hand, as far as we know, an

5 5 anomay is not something that can be stricty defined. Usuay, we think of an anomay as something caused by an outside phenomenon: an intruder in a computer network, a heart faiure, a gamber paying tricks. This infuences how we think of performance. If a detector fais to give an indication of an anomay, we have a miss or type II error, but if it gives an indication when such things are not happening we have a fase aarm type I error. Atypicaity, on the other hand, is purey a property of data. Ideay, there are therefore no misses or fase aarms: data is atypica or not. Here is what we mean. If there is an anomay that expresses itsef through the observed data, that must mean that there is some structure in the data, and in theory a source coder woud discover and expoit such structure and reduce code ength. Thus, if the data is not atypica that means there is simpy no way to detect the anomay through the observations again in theory. We therefore cannot reay ca that a miss. On the other hand, suppose that in a casino a gamber has a ong sequence of wins. This coud be due to fraud, but it coud aso be simpy due to randomness. But casino security woud be interested in either case for further scrutiny. Thus, the reason for the atypicaity does not reay matter, the atypicaity itsef matters. Sti, to distinguish the two cases we ca a sequence intrinsicay atypica if it is atypica according to Definition whie being generated from the typica probabiity mode, whie it is extrinsicay atypica if it is in fact generated by any other probabiity aw. Definition has two parts that work in concert, and we can write it simpified as C t x C a x > 0 where C t is the typica codeength and C a the atypica codeength. The typica code ength C t x is simpy an expression of the ikeihood of seeing a particuar sequence. If C t x is arge it means that the given sequence is unikey to happen, and detecting sequences by C t x > τ woud catch many outiers. As an extreme exampe, if a sequence is impossibe according to the typica distribution, C t x =, and it woud aways be caught. But it woud not work universay. If, as we started out with, typica sequences are iid uniform, any sequence is equay ikey and C t x > τ woud not catch any sequences. In this case, if a test sequence has some structure, it is possibe that C a x < C t x, and such sequences woud be caught by atypicaity; thus cacuating C a x is essentia. Cacuating C t x is aso essentia. Suppose that we instead use E[C t ] C a x, where E[C t ] is the code ength used to encode typica sequences on average, essentiay the entropy rate. Again, this wi catch some sequences: if a test sequence has more or ess structure than typica sequences, E[C t ] C a x 0. But again, it wi omit very obvious exampes: if as test sequence we use a typica sequence with 0 and swapped, E[C t ] C a x, whie on the other hand C t x > C a x. And impossibe sequences with C t x = woud not be caught with absoute certainty. Now, to decare something an outier, we have to find a coder with C a x < C t x. It is not sufficient that C t x is arge, i.e., that the sequence is unikey to happen. However, we can aways use the trivia coder that transmits data uncoded. If the sequence is unikey to happen according to the typica distribution, then it is ikey that C t x > ength of x. Thus, it can be seen that the two parts work in concert to catch sequences. Each part might catch some sequences, but to catch a anomaies, both parts have to be used. Another point of view is the foowing. Suppose again the typica mode is binary uniform iid. We ook at a coection of sequences, and now we want to find the most atypica sequences, i.e., the most interesting sequences. Without a specification of what interesting is, it seems reasonabe to choose those sequences that have the most

6 6 structure, and again this can reasonaby be measured by how much the sequence can be compressed. This is what Rissanen [7] cas usefu information, Ux = n C a x. But again, we need to take into account the typica mode if it is not uniform iid. For exampe, if typica sequences have much structure, then sequence with itte structure might be more interesting. We therefore end up with that C t x C a x is a reasonabe measure of how interesting sequences might be. A. Aternative approaches Whie, as argued in the introduction, and outined above, what we are aiming for is not anomay detection in the traditiona sense, there are sti many simiarities. And certainy information theory and universa source coding has been used previousy in anomay detection, e.g., [4], [], [], [3], [4], [5], [6], [7], [8], [9], [30]. The approaches have mosty been heuristic. A more fundamenta and systematic approach is Information Distance defined in [3]. Without being abe to caim that this appies to a of the perhaps hundreds of papers, we think the various approaches can be summarized as using universa source coding as a type of distance measure, whether it satisfies strict mathematica metric properties as in [3] or is more heuristic. On the other hand, our methodoogy in Definition cannot be cassified as a distance measure in a traditiona sense. We are instead trying to find aternative expanations for data. We wi comment on how our approach contrasts with a few other approaches. Whie the simiarity distance deveoped in [3] is not directy appicabe to the probem we consider, we can to some extent adapt it, which is usefu for contrast. The simiarity distance is d = min{ky x, Kx y } max{kx, Ky} Instead of being given the typica distribution, we can imagine that we are given a very ong typica sequence x which is used for training. In that case d = Kx y Kx = Kx, y Ky Kx within a certain approximation. Suppose, as was our starting point above, that the typica distribution is binary iid uniform. If y is aso binary iid uniform, within a constant Kx, y = Kx + Ky, and d =. But if y is drawn from some other distribution, x cannot hep describing x either, and sti d =. That makes sense: two competey random sequences are not simiar, whether they are from the same distribution or not. Thus, simiarity distance cannot be used for anomay detection as we have have defined it: ooking for specia sequences in the words of Komogorov. This is not a probem of the simiarity metric; it does exacty what it is designed for, which is reay deterministic simiarity between sequences, appropriate for cassification. The reason simiarity distance sti gives resuts for anomay detection [3] is actuay that universa source coders approximate Komogorov compexity poory. Heuristic methods using for anomay detection using universa source coding [4], [], [], [3], [4], [5], [6], [7], [8], [9], [30], [], [] are mosty based on comparing code ength. Let Cx be the code ength to encode the sequence x with a universa source coder. Let x be a training string and y a test sequence. We can

7 7 then compare Cx x with Cy y which coud be seen as a measure of entropy rate or compare Cxy with Cx to detect change. The issue with this is that there are many competey dissimiar sources that have the same entropy rate. As an exampe, et the data be binary iid with the origina source having P X = = 3 and the new source P X = = 3. Then the optimum code for the origina source and the optimum code for the new source have the same ength. On the other hand, atypicaity wi immediatey distinguish such sequences. III. BINARY IID CASE In order to carify ideas, at first we consider a very simpe mode. The typica mode is iid binary with P X n = = p. The aternative mode cass aso binary iid but with P X n = = θ, where θ is unknown. We want to decide if a given sequence x is typica or atypica. This can be stated as the hypothesis test probem H 0 :θ = p H :θ p This probem does not have an UMP universa most powerfu test. However, a common approach to soving this type of probem is the GLRT generaized ikeihood ratio test [33]. Let P b = P X n = b ˆP b = Nb x where is the sequence ength and Nb x is the number of x n = b {0, }. The GLRT is b=0 L = og ˆP b Nb x b=0 P bnb x = Nb x og Nb x Nb x og P b b=0 = ˆP b og Nb x b=0 b=0 b=0 = Dˆp p L > t φx = 0 L t ˆP b og P b Where Dˆp p = b=0 ˆP b og ˆP b P b is the reative entropy [6] and t some threshod. Whie the GLRT is a heuristic principe, it satisfies some optimaity properties, and in this case it is equa to the invariant UMP test [34], which can be considered an optimum soution under certain constraints. Thus, it is reasonaby to take this as the optimum soution for this probem, and we do not need to appea to Komogorov or information theory to sove the probem. The compications start if we consider sequences of variabe ength. The test depends on the sequence ength. We need to choose a threshod t as a function of, which wi then resut in a fase aarm probabiity P F A t

8 8 and detection probabiity P D t. There is no obvious argument for how to choose t from a hypothesis testing point of view; we coud choose t independent of, but that is just another arbitrary choice. We wi consider this probem in the context of Definition. In order to do so, we need to mode the probem from a coding point of view. We assume we have an infinite sequence of sequences of variabe ength i, and these need to be encoded. We need to encode each bit, and aso to encode whenever a new sequence starts. For typica encoding of the bits we can use a Shannon code, Huffman code, arithmetic coding etc. The code ength for a sequence of ength is L t = N x og p + N0 x og p = ˆp og p + ˆp og p except for a sma constant factor; here ˆp = ˆP = xi. We aso need to encode where a sequence ends and a new one starts. For simpicity et us for now assume engths are geometricay distributed. We can then mode the probem as one with three source symbos 0, and, with an iid distribution with P, = ɛ, P 0 = p ɛ, P = p ɛ. If we assume ɛ is sma, the expression is sti vaid for the content part, and to each sequence is added a constant og ɛ to encode separators. To decide if a sequence is atypica according to Definition, we can use the universa source coder from [6]: the source encodes first the number of ones k; then it enumerates the sequences with k ones, and transmits the index of the given sequence. For anaysis it is important to have a simpe expression for the code ength. We can therefore use L a = Hˆp + og. This is an approximation which is good for reasonaby arge and it aso reaches the ower bound in [7], [35]. The source-coder aso needs to inform the decoder that the foowing is an atypica sequence so that it knows to use the atypica decoder rather than the typica encoder, and where it ends. For the former we can use a. to indicate the start of an atypica sequence rather than the, for typica sequences. If the probabiity that a sequence is atypica is δ, P. = δɛ and P, = δɛ ɛ. The code ength for a. now is og ɛ og δ. To mark the end of the atypica sequence we coud again insert a. or a,. But the code for either is based on the distribution of engths of typica sequences, which we assume known, whereas we woud have no knowedge of the ength of atypica sequences. Instead it seems more reasonabe to encode the ength of the specific atypica sequence. As argued in [8], [36] this can be done with og + og c, where c is a constant and og = og + og og + og og og + 3 where the sum continues as ong as the argument to the og is positive. To summarize we have L t = ˆp og p + ˆp og og ɛ p L a = Hˆp + og + og + og c og ɛ og δ Hˆp + 3 og og ɛ + τ τ = og δ + og c 4

9 9 The criterion for a sequence to be atypica is L a < L t, which easiy seen to be equivaent to Dˆp p > τ + 3 og If the engths are fixed, this reduces to. But if the engths are variabe, 5 provides a threshod as a function of. The term 3 og ensures that im P F A = 0, which seems reasonabe. If instead Dˆp p > τ 5 is used, it is easy to see that im P F A > 0. Except for this property, the term 3 og might seem arbitrary, e.g., why 3? But it is based on soid theory, and as wi be seen ater it has severa important theoretica properties. We wi examine the criterion 5 in more detai. The inequaity 5 gives two threshods for ˆp, ˆp > p + ˆp < p Where 0 < p < p < p + <. It is impossibe to find expicit expressions for p ±, but it is cear that p ± p as. Therefore, for arge, we can repace Dˆp p with a series expansion. We then end up with the more expicit criterion p ˆp pq n 4 > τ + 3 og ˆp p > τ = pq n 4 τ + 3 og 6 In the foowing we wi use this as it is consideraby simper to anayze. We can aso write this as i= x i p > τ n + 3 n 7 pq Now, if not for the term 3 n, this woud be a centra imit type of statement, and the probabiity that a sequence is cassified as intrinsicay atypica woud be P A Q n τ independent of. Our main interest is exacty the the dependency on, which is given by the foowing Theorem Theorem. Consider an iid {0, }-sequence. Let P A be the probabiity that a sequence of ength is cassified as intrinsicay atypica according to 6. Then P A is bounded by For p = this can be strengthened to τ : im K, τ = P A τ+ K, τ 9 3/ P A τ+ 3/ 0 8

10 0 These bounds are tight in the sense that n P A im 3 n = Proof: The Chernoff bound e.g., [37] states P A = P ˆp p > τ =P X i p + b Where as usua, q = p b = pq n 4 i= { inf e sp sb M X s } s>0 τ + 3 og and M S s is the moment generating function of X i, which for a Bernoui random variabe is Then M X s =pe s + q Minimizing over s gives or P A inf {exp s p + b pe s + q } s>0 p b q qp + b P A q b pq b n q P A n = n qp + b + p b n q b pq b + b + p b n + q b b pq b b 3 q p + b7p 6p 3 + b 6p + 3 6p b q 3 b b3 + O pq = τ n 3 n + O n3/ τ 3/, where we have used x x For p = x n + x x + x3 3 Hoeffding s inequaity [38] gives the bound P A exp b = exp n τ + 3 og for x 0. The equation directy eads to 9. 3 for p = this is tighter than.

11 For the ower bound we use moderate deviations from [39]. Define X i = Xi p pq. We can then rewrite 7 as X i= i > τ n + 3 n We define a = τ n +3 n, which satisfies im a = 0, im a =. Using this as a in [39, Theorem 3.7.] gives im inf = im inf τ n + 3 n n P 3 n n P X i= i > τ n + 3 n X i= i > τ n + 3 n Together with the upper bound, this gives. Figure compares the upper bound with simuations. 0 Upper bound Simuation P A n Fig.. Simuated P A and the Upper bound for τ =, p = 0.3. We can aso bound the miss probabiity for extrinsicay atypica sequences as foows Theorem 3. Suppose that the typica sequence is iid {0, }-sequence with P X n = = p. Let the test sequence by iid with P X n = = p a. The probabiity that the test sequence is missed according to criterion 6 is upper

12 bounded by pqτ n +3 n P M τ qa p 3/ p a q q p a q p+ p p ap p K, τ 4 τ : im K, τ = Proof: We may assume that p a < p. Simiary to the proof of Theorem the Chernoff bound is P M inf {exp s p b p a e s + q a } s>0 Minimizing over s gives or using series expansions. p+b qa qa p b P M q + b p a q + b qa n P M n q + b n + b qa q n + p + b n p n p + n q qa p a qa p n p a qa p b p a q + b q p b pq + O b 3 A. Hypothesis testing interpretation The soution 5 may seem arbitrary, but it has a nice interpretation in terms of hypothesis testing [40]. Return to the soution. That soution gives a test for a given. However, the probem is that it does not reconcie tests for different. One way to sove that issue is to consider a random variabe, i.e., introducing a prior distribution in the Bayesian sense. Let the prior distribution of be P L. The equation now becomes b=0 L = og ˆP b Nb x P L b=0 P bnb x P L 0 = ˆP b og Nb x b=0 ˆP b og P b + og P L og P L 0 b=0 = Dˆp p + og P L og P L 0 The hypothesis test now is Dˆp p > τ + og P L0 og P L 5

13 3 Of course, the probem is that we don t know P. Sti, compare that with 5 without the approximations, Dˆp p > τ + og + c + og To the term c + og corresponds a distribution on the integers, namey Q in [8, 3.6]. Except for the term og, the equations 5 and 6 are identica if we use the prior distribution P L = Q. Rissanen [8] argues that the distribution Q is the most reasonabe distribution on the integers when we have reay no prior knowedge, mainy from a coding point of view. This therefore seems a reasonabe distribution for P. What about the term og? The mode for the non-nu hypothesis has one unknown parameter, p, so that it is more compex than the nu hypothesis. We have to account for this additiona compexity. Our goa is to find an expanation for atypica sequences among a arge cass of expanations, not just the distribution of zeros and ones. If there is no penaty for finding a compex expanation, any data can be expained, and a data wi by atypica. This is Occam s razor [6]. The penaty for one unknown parameter as argued by Rissanen is exacty og. We therefore have the foowing expanation for 5, Fact 4. The criterion 5 can be understood as a hypothesis test with prior distribution Q [8] and penaty og for the unknown parameter. Seen in this ight, Theorem is not surprising. In 5 we have repaced og + og with 3 og, which impicity corresponds to the prior distribution P L 3/, which is exacty the distribution seen in 9. 6 B. Atypica subsequences One probem where we beieve our approach exces is in finding atypica subsequences of ong sequences. The difficuty in find atypica subsequences is that we may have short subsequences that deviate much from the typica mode, and ong subsequences that deviate itte. How do we choose among these? Definition gives a precise answer. For the forma probem statement, consider a sequence {x n, n =,..., } from a finite aphabet A where in this section A = {0, }. The sequence is generated according to a probabiity aw P, which is known. In this sequence is embedded infrequent finite subsequences X i = {x n, n = n i,..., n i + i } from the finite aphabet A, which are generated by an aternative probabiity aw P θ. The probabiity aw P θ is unknown, but it might be known to be from a certain cass of probabiity distributions, for exampe parametrized by the parameter θ. Each subsequence X i may be drawn from a different probabiity aw. The probem we consider is to isoate these subsequences, which we ca atypica subsequences. In this section, as above, we wi assume both P and P θ are binary iid. The soution is very simiar to the one for variabe ength sequences above. The atypica subsequences are encoded with the universa source coder from [6] with a code ength L a = Hˆp + og. The start of the sequence is encoded with an extra symbo. which has a code ength og P. and the ength is encoded in og bits. In concusion we end up with exacty the same criterion as 5, repeated here Dˆp p > τ + 3 og 7

14 4 The ony difference is that τ has a sight different meaning. For the subsequence probem, a centra question is what the probabiity is that a given sampe x n is part of an intrinsicay atypica subsequence. Notice that there are infinitey many subsequences that can contain x n, and each of these have a probabiity of being atypica given by Theorem. We can obtain an upper bound as foows. Let us say that X n has been determined to be part of an atypica sequence X i. It is cear that the sequence X i must aso be atypica according to 7. Therefore, we can upper bound the probabiity P A X n that X n is part of an atypica sequence with the probabiity of the event 7, using the approximate criterion 6, n n < n + : We can rewrite this as n n < n + : n n < n + : n+ i=n n+ i=n n + i=n X i p > τ n + 3 n pq X i p > τ n + 3 n pq X i p > pq n τ + 3 og We coud upper bound this with a union bound using Theorem. However, it is quicky seen that this does not converge. The probem is that the events in the union bound are highy dependent, so we need a sighty more refined approach; this resuts in the foowing Theorem Theorem 5. Consider the case p =. The probabiity P AX n that a given sampe X n is part of an atypica subsequence is upper bounded by for some constants K, K. P A X n K τ + K τ 8 Proof: Without oss of generaity we can assume n = 0. For some 0 > 0 et I 0 be the set of subsequences containing X 0 of ength 0. For i I 0 et i be the ength of the subinterva. From Theorem we know that P A i τ+ 3/ K, τ and therefore i I 0 P A i K τ for some constant K. This argument does not work if we aow arbitrariy ong subsequences, because the sum is divergent. However, we can write P A X 0 i I 0 P A i K τ + P A,0 X 0 where P A,0 X 0 is the probabiity that X 0 is in an atypica subsequence of at east ength 0. The proof wi be to bound P A,0 X 0.

15 5 Define the foowing events An, = An, = { n+ } X i p > pq n τ + 3 og i=n { n+ } X i p < pq n τ + 3 og For p = we can rewrite n+ For ease of notation define i=n X i p > pq n τ + 3 og i=n n + = X i > n τ + 3 og 9 i=n υ = n τ + 3 og Then using the union bound we can write P A X 0 = = n = + P n = n = n = n = P P = n + = n + P = n + An, = n + An, An, A c n, P A c n, A c n, = n + =n n + where we have excuded the ength one sequence consisting of X 0 itsef. Now consider P A c n, = n + Ac n, = P An, = n + Ac n,. We can think of S = n + i=n X i as a simpe random [4], and we wi use this to upper bound the probabiity P An, = n + Ac n,. This probabiity can be interpreted as the probabiity that the random wak passes υ given that it was beow υ at times n < <. But since the random wak can increase by at most one, and since the threshod is increasing with, that means that at time we must have S = υ. Furthermore, it is easy to see that the probabiity is upper bounded by the probabiity that S = υ

16 6 given that the random wak is beow υ at times n < <. Thus P An, A c n, = n + P S = υ S < υ, n < < = P S = υ, S < υ, n < < P S < υ, n < < P S = υ, S < υ, n < < P S < υ, 0 < The denominator can be interpreted as the probabiity that the maximum of the random wak stays beow υ, which by Theorem can be expressed by P D = P S < υ, 0 < = P S υ + P S = υ τ+c 3/ 0 for τ and sufficienty arge, and where c is some constant. Since, as discussed at the start of the proof, we can assume that 0, we can choose 0 arge enough that this is satisfied; furthermore, since P N is increasing in τ, we can choose 0 independent of τ as ong as τ is sufficienty arge. We wi next upper bound the numerator in 0. This is the probabiity that we have a path that has stayed beow υ at steps n < <, but then at step hits υ. We wi count such paths. We divide them into two groups that we count separatey. The first group are a paths that start at zero and hit υ first time after steps. The second group is more easiy described in reverse time. Those are paths that start at υ at step, then stay beow υ unti time ñ < 0, when they hit υ again, and finay hit 0 at time n. According to [4, Section 3.0] we can count a these paths by N = υ n + N 0, υ + t N t, 0N t 0, υ t=υ Where N n a, b are the number of ength n paths between a and b. We need to upper bound the probabiity P S n = k that a path starting a 0 hits k after n steps. We use [4,

17 7 Section 3.0] and [6, 3.] to get P S n = k =N n 0, k n n = n n + k n n + kn knh = π 4 4n πn k nh We can bound the power of the exponent to as foows Thus, where e x = x. n + k nh n =n H + k n n n k n = k n n n+k n n n+k n n n P S n = k πn e n k n We wi use this to bound the probabiity of set of paths in the second term in. We can bound P n, = n + t=υ n + t=υ t N t, 0N t 0, υ t π t e n t N t 0, υ t n 4 + P S π + n 3/ t = υ Here the sum n + t=υ P S t = υ when ooked at in reverse time can be interpreted as the probabiity of a path t=υ starting at υ hits zero before time n +. We can the write this as See [4, Section 3.0] n + t=υ P S t = υ = P M n+ υ P S n+ υ

18 8 We can use the proof of Theorem, specificay 3 to bound this by P M n+ υ exp υ n + Then P n, K exp υ n + + n 3/ n + We wi next bound the probabiity of the paths in the first term in. We have 4 P S = υ π υ e υ n 4 e υ π υ n = π υ τ and Thus P = υ P S = υ n τ + 3 og π 8τ n τ 5/ π υ n + π υ υ 4 π τ 5/ τ n K = n + = n + n = n +. =S n, τ P A c n, = n + P P n, P D P P n, A c n,

19 9 and where K > 0 is some constant. P A X 0 K n = e S n,τ n = = n + + K First we evauate the sum of P. The term υ P n = = n + P n, 3 is decreasing in, so for sufficienty arge, υ. We can evauate the sum separatey for 0 and for >. Convergence depends ony on the atter tai. The threshod is increasing with τ. If for exampe we put = 8τ n, i.e., proportiona to τ, we have υ for τ > 0. Therefore For > we can write Then for n + > = 0 P Kτ τ 6τ n 4 n 4 P = τ 5/ + π π π τ 5/ =k K = n + P k τ nn n n k τ erfc n n where k i > 0 are some constants and where we have used 5/ x 5/ dx = 3k 3/ + k 3 τ τ n 3 4 k n 5/ n xx 5/ dx =k = 9 k 6πerfc [ 3 n k ] + 6 n k k 3/ as it can be verified that a three sums, when 4 is inserted in 3, are convergent, using fxdx. k= fk f +

20 0 We bound the second sum in 3, n = = n + = n = = n + We can ignore the sma constants and write P = = P n, n = = n + n = = n + t 8 n + exp υ π + n 3/ n + 8 exp υ n π + n 3/ n 8 n π + n 3/ τ n 3 n 8 t τ t π t 3/ 3 t ddt 8 t = πt 3/ 8 = π = t 3 dt 8 π 8 = 3 π = τ 8 3 π 3/ τ 3 t 3 td dt 3/ τ 3 t 3 d dt 3/ τ 3 d 3/ τ 3 d 3/ τ 3 d The remaining integra is ceary convergent, and decreasing in τ. Therefore P K τ There are two important impications of Theorem 5. First is that for τ sufficienty arge, P A X n <, and in fact P A X n can be made arbitrariy sma for arge enough τ. This is an important theoretica vaidation of Definition and the resuting criterion 5 and 6. If the theory had resuted in P A X n = then everything woud be atypica, and atypicaity woud be meaningess. That this is not triviay satisfied is shown by Proposition 6 just beow. What that Proposition says is that if in the above equation instead of 3 og we had had og, then everything woud have been atypica. Now, og corresponds to forgetting that the ength of an atypica sequence aso needs to be encoded for the resuting sequence to be decodabe. Thus, it is the strict adherence to decodabiity that has ead to a meaningfu criterion. So, athough decodabiity at first seems unreated to detection, it turns out to be of crucia importance. Simiary, at first the term 3 og may have seen arbitrary. However, this is just within a margin sufficient to ensure that not everything becomes atypica. The second important impication of Theorem 5 is that it vaidates the meaning of τ. The way we introduced τ was as the number of bits needed to encode the fact that an atypica sequence starts, and therefore we shoud put τ = og P atypica sequence starts. Theorem 5 confirms that τ has the desired meaning for purey random sequences. And the reasons is this is not trivia is that τ was chosen from the probabiity of an atypica sequence, whie Theorem 5 gives the probabiity of a sampe being atypica.

21 Proposition 6. Consider the case p =. Suppose instead of 7 we use the criterion i= X i p 4 > τ n + α n 5 with α = 3 giving 7. Then if α, the probabiity that a given sampe X n is part of an atypica subsequence is P A X n =. Proof: We can assume that n = 0. We wi continue with the random wak framework from the proof of Theorem 5. Define the event and Then Ā = A = { 0 i= + { 0 i= + X i > n τ + α og X i < n τ + α og υ = n τ + α og P A X 0 P Ā A =0 Namey, we decare that X 0 is atypica if it is the endpoint of an atypica sequence {x[ ], x[ + ],..., x[0]} for some. Ceary, X 0 coud be the start or midpoint of an atypica sequence, so this a rather oose ower bound. Now we can write Consider the probabiity P P Ā A =0 =0 = P Ā c =0 =0 A c =0 = P Ā c A c Ā c k =0 k=0 k=0 [ = P Ā A Ā c k =0 k=0 A c k Ā A k=0 Āc k k=0 Ac k. The ony way the conditiona event can happen is if k=0 A c k } } ]

22 S = υ and X = or S = υ + and X =. Here we have Here P S = υ = N 0, υ + = + + υ H + υ υ = H υ H +υ + H = H n + υ + +υ + +υ + + υ υ 3 υ + o υ 3 + ɛ υ + υ + + υ3 ɛ = υ n n = υ n + υ3 ɛ υ υ3 n Where the ast inequaity is ony true for sufficienty arge, as for some 0 we have > 0 : ɛ <. Then P S = υ τ α υ 3 = τ α+ υ3 And n P Ā A n = =0 =0 τ α+ υ3 τ α+ υ3 =0

23 3 Here im υ 3 = 0. So, for exampe, for sufficienty arge, υ3. Then n P Ā A = 0 =0 τ α+ This is divergent for α proving that P =0 Ā =0 A =. Theorem 5 states that for α = 3 P AX n < convergence, whie Proposition 6 shows that for α = P A X n = divergence. There is a gap between those vaues of α that is hard to fi in theoreticay. We have therefore tested it out numericay, see Fig.. Of course, testing convergence numericay is not quite we-posed. Sti the figure indicates that the phase transitions between divergence and convergence happens right around α =. =0 Probabiity of atypicaity of same X n for different vaues of τ when ength=0 5 + and n = ength/ at the midde τ = 0.5 τ = τ = Probabiity α Fig.. Transition between divergence and convergence as a function of α C. Recursive coding Instead of using Definition directy, we coud approach the probem as foows. First the sequence is encoded with the typica code. Now, if the distribution of the sequence is in agreement with the typica code, the resuts shoud be a sequence of iid binary bits with P X i = = [6], i.e., a purey random sequence; and this sequence cannot be further encoded. We can now try if we can further encode the sequence with a universa code. If so, we categorize the sequence as atypica. Let be the ength of the sequence after typica coding. In 4 the typica and atypica codeengths are therefore L t = og ɛ L a = Hˆp + 3 og og ɛ + τ 6

24 4 Here ˆp is the estimated p for the encoded sequence. Now = ˆp og p + ˆp og p 7 Hˆp Hˆp 8 og = og + og ˆp og p + ˆp og p 9 og 30 The argument for 8 is as foows without doing detaied cacuations: if we encode a sequence with a wrong code and then ater re-encode with the correct code for the induced statistic, the resut is the same as originay encoding with the correct code. Thus the criterion 4 and 6 are approximatey equivaent. We can state this as foows Proposition 7. Definition can be appied to encoded sequences instead of the origina data. This of course ignores a integer constraints, bock boundaries etc. But the importance of this statement is that it is sometimes easier to operate on partiay encoded sequences simpy because the amount of data has aready been reduced, and the probem has been standardized: as such, we do not need to know the typica codebook or even the mode of typica data since everything under the typica mode has been reduced to a stream of iid binary digits, and atypicaity agorithms can therefore be appied to data streams without knowedge of what is the origina data. It aso means that theoretica resuts such as Theorem 5 where we assume typica data is iid uniform has genera appicabiity. However, first encoding the sequence and then doing atypicaity detection aso has disadvantages in a practica, finite ength setting. Atypica subsequences become embedded in typica sequences in unpredictabe ways. For exampe, it coud be difficut to determine where exacty an atypica sequence starts and ends. Our practica impementation therefore uses Definition directy. IV. GENERAL CASE Return to the probem considered at the start of Section III where we are given a sequence x of fixed ength and we need to determine if it is atypica. In the iid case this is a simpe hypothesis test probem and the soution is given by. In the genera case we woud ike find to aternative expanations from a arge abstract cass of modes. The issue is that it is often possibe to fit an aternative mode very we to the data if we just aow compex enough modes the we known Occam s razor probem [6]. Rissanen s MDL [4], [8], [9] is a soution to this probem. Therefore, in the genera case, even for fixed ength sequences, the probem is not a straightforward hypothesis test probem, and we have to resort to information theory.

25 5 A. Finite State Machines On possibe cass of modes in the genera case is the cass of finite state machines FSM. Rissanen [7] defines the compexity of a sequence x in the cass of FSM sources by Ix = min{ og ˆP x f j + og j + c} 3 where f, f,... is a sequence of state machines, and where we have used ˆP x f j to emphasize that the probabiity is estimated. Rissanen uses Lapace s estimator, but the KT-estimator [43], [6] coud aso be used. Except for integer constraints, this is a vaid descriptive ength, and can therefore be used in Definition. This is a natura extension of the iid case considered in Section III. As opposed to Komogorov compexity, this compexity coud actuay be cacuated, athough with high compexity. Because of the compexity, it is mosty usefu for theoretica considerations, and one resut is the foowing generaization of Theorem Theorem 8. Assume that the typica distribution is iid uniform. If the atypica descriptive ength is given by 3 with a maximum number of states independent of, the probabiity of an intrinsicay atypica sequence P A satisfies n P A im 3 n = 3 Proof: Since we consider a state machines with the number of states up to a certain maximum, this must aso incude the state machine with a singe state. This is equivaent to the iid mode in Section III, and we therefore get the ower bound in 3. The proof wi be to upper bound the probabiity. As in Section III we use og + τ bits to indicate beginning and end of atypica sequences. The probabiity that a sequence x is atypica therefore is P A = P Ix + og + τ > = P f j og ˆP x f j + og j + c + og + τ > f j P og ˆP x f j + og + τ > We wi prove that P og ˆP x f j + og + τ > K j k+/ for constants K j and k the number of states in the state machine, and since the sowest decay dominates, we get the upper bound for 3. For a fixed state machine f the code ength according to [7, 3.6] is Lx f = og n sx + ogn s x + s n 0 s x s where n s x denotes the number of occurrences of state s in x and n 0 s x the number of times the next symbos

26 6 is 0 at this state. Further, from [6, 3.] Lx f n s x n0 s x H n s s x og n sx og 8 n 0 sx n s x n s x n s x + ogn s x + 33 We want to upper bound the probabiity of the event Lx f + og + τ <. We can write ogn s x + ogn sx = og + og ns x + n s x. Let rx = s n s x n0 s x H n s x and et Rx be the remaining sma terms in 33 dependent on x, Rx = og 8 n 0 sx n s x n s s x n s x ns x + + og n s x. Then we have to upper bound notice that s n sx =, P rx Rx τ + k + og The Chernoff bound is or where P rx Rx τ + k + og exp tτ + k + og Mt n P rx Rx τ + k + og tτ + k + og + n Mt Mt = E [ exp trx + Rx ] In order to get a vaid bound, we need to show that Mt < K < independent of for t < n. Now it s easy to see that exp trx K < for a t and. So, we have to show E [ exp trx ] K <

27 7 We have to show that this is true for a state machines in the cass of finite state machines with k states, which can be done by showing max FSM with k states E [ exp trx ] K It turns out it easier to prove this if we expand the cass over which we take the maximum, and ceary expanding the cass does not decrease the maximum. A FSM with k states is a function fx {,..., k} that satisfies that if fx m = f x m = s then fx m b = f x m b for any bit b [7], i.e., if the FSM is in state s after m steps, the next state transition is ony dependent on the next bit, not how it got to state s. We extend the cass by dispensing with this requirement. We can then describe the program we run as foows. Based on x m we choose a state s m {,..., k} without having any knowedge about x m+, except that it is independent and uniformy distributed by the assumption on the typica distribution. We can think of this sighty differenty. The program puts x m+ into bucket s {,..., k} and updates n s m and n i s m, in order to maximize E [ exp trx ]. It does so based on past data x m. Now, as opposed to the state machine setup, the choice of s m in no way restricts the choices of states or buckets s n, n > m. Since the program has no knowedge of x m+ the program cannot optimize s m based on the vaues of x m. Rather, it is sufficient to ook at n s m. It is now easy to see that the worst case is obtained if the bits are distributed eveny in the states. Thus, the worst case of rx is rx = n0 s x H k /k s where the n 0 s x are independent of s. Thus, the probem is reduced to the case of a singe state, which is showing that Here we have [ E exp t H [ E exp t H = k=0 = + ] k K < 34 ] k t n H k k / k= t n H k k / + t n H k H k πk k k= / = + t n H k πk k k= / + t n k n πk k k= where we have used [6, 3.] and. The sum is actuay decreasing as a function of, but this seems hard to prove. Instead we upper bound the sum by

28 8 / k= / t n n k 4k Here we can upper bound πk k k + π / t n n k πk k πk k dk for k. Then t n k n πk k dk / / / = K + K t n n x 4x t n n x 4x for some constants K, K, using Gaussian moments. This proves dx π + dx π 0 - The probabiity of intrinsicay atypica sequence 0 - Probabiity Upper Bound 0-3 P A Length Fig. 3. Probabiity of an intrinsicay atypica sequence. The typica distribution is iid uniform, and for detection of atypica sequences the CTW agorithm has been used Section IV-B. Whie the Theorem is for the typica mode iid uniform, as outined in Section III-C in principe it aso appies to genera sources, since we can first encode and then ook for atypica sequences. The theorem shows that ooking for more compex expanations for data does not essentiay increase the probabiity of intrinsicay atypica sequences. Fig. 3 compare with Fig. confirms this experimentay. The

29 9 atypica detection is based on CTW, which as expained in Section IV-B beow, is a good approximation of FSM modeing. On the other hand, if one of the FSM modes do in fact fit the data, the chance of detecting the sequence is greaty increased, athough hard to quantify. If we think of intrinsicay atypica sequences as fase aarms, this shows the power of the methodoogy. Since FSM sources has the same P A as in the iid case, it seems reasonabe to conjecture that Theorem 5 is sti vaid, that is P A X n < for sufficienty arge τ, which is ceary an essentia theoretica property of atypicaity. However, as Theorem 5 does not foow directy from Theorem, to verify the conjecture requires a forma proof which we do not have at present. B. Atypica Encoding In terms of coding, Definition can be stated in the foowing form Cx P Cx > 0 Here Cx P is the code ength of x encoded with the optimum coder according to the typica aw, and Cx is x encoded in itsef. As argued in Section III, we need to put a header in atypica sequences to inform the encoder that an atypica encoder is used. We can therefore write Cx = τ + Cx, where τ is the number of bits for the header, and Cx is the number of bits used for encoding the data itsef. For encoding the data itsef an obvious soution is to use a universa source coder. There are many approaches to universa source coding: Lempe-Ziv [6], [44], [45], Burrows-Wheeer transform [46], partia predictive mapping PPM [47], [48], or T-compexity [49], [6], [50], [5], [5], [53], [54], and anyone of them coud be appied to the probem considered in this paper. The idea of atypicaity is not inked to any particuar coding strategy. In fact a coding strategy does not need to be decided. We coud try severa source coders and choose the the one giving the shortest code ength; or they coud even be combined as in [55]. However, to contro compexity, we choose a singe source coder. The most popuar and simpest approach to source coding is perhaps Lempe-Ziv [6], [44], [45]. The issue with this is that whie Cx it is optimum in the sense that im sup = HX wp, the convergence is very sow. According to [56] [ ] [ ] E Cx HX og whie var Cx. Thus, Lempe-Ziv is poor for short sequences, which is exacty what we are interested in for atypicaity. We have therefore chosen to use the Context Tree Weighing CTW agorithm [43]. The CTW approach has some advantages in our setup: it is a natura extension of the simpe exampe considered in Section III, it aows estimation of code ength without actuay encoding, there is fexibiity in how to estimate probabiities. Importanty, it can be seen as a practica impementation of the FSM based descriptive ength used in Section IV-A. C. Typica Encoding and Training In Definition and the exampe in Section III we have assumed that the typica mode of data is exacty known. If that is the case, typica encoding is straightforward, using for exampe arithmetic coding notice that we just need codeength, which can be cacuated for arithmetic coding without actuay encoding. However, in many cases

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

Efficient Generation of Random Bits from Finite State Markov Chains

Efficient Generation of Random Bits from Finite State Markov Chains Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

More Scattering: the Partial Wave Expansion

More Scattering: the Partial Wave Expansion More Scattering: the Partia Wave Expansion Michae Fower /7/8 Pane Waves and Partia Waves We are considering the soution to Schrödinger s equation for scattering of an incoming pane wave in the z-direction

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

FOURIER SERIES ON ANY INTERVAL

FOURIER SERIES ON ANY INTERVAL FOURIER SERIES ON ANY INTERVAL Overview We have spent considerabe time earning how to compute Fourier series for functions that have a period of 2p on the interva (-p,p). We have aso seen how Fourier series

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

14 Separation of Variables Method

14 Separation of Variables Method 14 Separation of Variabes Method Consider, for exampe, the Dirichet probem u t = Du xx < x u(x, ) = f(x) < x < u(, t) = = u(, t) t > Let u(x, t) = T (t)φ(x); now substitute into the equation: dt

More information

4 1-D Boundary Value Problems Heat Equation

4 1-D Boundary Value Problems Heat Equation 4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Emmanuel Abbe Colin Sandon

Emmanuel Abbe Colin Sandon Detection in the stochastic bock mode with mutipe custers: proof of the achievabiity conjectures, acycic BP, and the information-computation gap Emmanue Abbe Coin Sandon Abstract In a paper that initiated

More information

arxiv:quant-ph/ v3 6 Jan 1995

arxiv:quant-ph/ v3 6 Jan 1995 arxiv:quant-ph/9501001v3 6 Jan 1995 Critique of proposed imit to space time measurement, based on Wigner s cocks and mirrors L. Diósi and B. Lukács KFKI Research Institute for Partice and Nucear Physics

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Traffic data collection

Traffic data collection Chapter 32 Traffic data coection 32.1 Overview Unike many other discipines of the engineering, the situations that are interesting to a traffic engineer cannot be reproduced in a aboratory. Even if road

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

<C 2 2. λ 2 l. λ 1 l 1 < C 1

<C 2 2. λ 2 l. λ 1 l 1 < C 1 Teecommunication Network Contro and Management (EE E694) Prof. A. A. Lazar Notes for the ecture of 7/Feb/95 by Huayan Wang (this document was ast LaT E X-ed on May 9,995) Queueing Primer for Muticass Optima

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

Manipulation in Financial Markets and the Implications for Debt Financing

Manipulation in Financial Markets and the Implications for Debt Financing Manipuation in Financia Markets and the Impications for Debt Financing Leonid Spesivtsev This paper studies the situation when the firm is in financia distress and faces bankruptcy or debt restructuring.

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channes arxiv:cs/060700v1 [cs.it] 6 Ju 006 Chun-Hao Hsu and Achieas Anastasopouos Eectrica Engineering and Computer Science Department University

More information

Applied Nuclear Physics (Fall 2006) Lecture 7 (10/2/06) Overview of Cross Section Calculation

Applied Nuclear Physics (Fall 2006) Lecture 7 (10/2/06) Overview of Cross Section Calculation 22.101 Appied Nucear Physics (Fa 2006) Lecture 7 (10/2/06) Overview of Cross Section Cacuation References P. Roman, Advanced Quantum Theory (Addison-Wesey, Reading, 1965), Chap 3. A. Foderaro, The Eements

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Chemical Kinetics Part 2

Chemical Kinetics Part 2 Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Chemical Kinetics Part 2. Chapter 16

Chemical Kinetics Part 2. Chapter 16 Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates

More information

Count-Min Sketches for Estimating Password Frequency within Hamming Distance Two

Count-Min Sketches for Estimating Password Frequency within Hamming Distance Two Count-Min Sketches for Estimating Password Frequency within Hamming Distance Two Leah South and Dougas Stebia Schoo of Mathematica Sciences, Queensand University of Technoogy, Brisbane, Queensand, Austraia

More information

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law Gauss Law 1. Review on 1) Couomb s Law (charge and force) 2) Eectric Fied (fied and force) 2. Gauss s Law: connects charge and fied 3. Appications of Gauss s Law Couomb s Law and Eectric Fied Couomb s

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

b n n=1 a n cos nx (3) n=1

b n n=1 a n cos nx (3) n=1 Fourier Anaysis The Fourier series First some terminoogy: a function f(x) is periodic if f(x ) = f(x) for a x for some, if is the smaest such number, it is caed the period of f(x). It is even if f( x)

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS ECCM6-6 TH EUROPEAN CONFERENCE ON COMPOSITE MATERIALS, Sevie, Spain, -6 June 04 THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS M. Wysocki a,b*, M. Szpieg a, P. Heström a and F. Ohsson c a Swerea SICOMP

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Lecture 6 Povh Krane Enge Williams Properties of 2-nucleon potential

Lecture 6 Povh Krane Enge Williams Properties of 2-nucleon potential Lecture 6 Povh Krane Enge Wiiams Properties of -nuceon potentia 16.1 4.4 3.6 9.9 Meson Theory of Nucear potentia 4.5 3.11 9.10 I recommend Eisberg and Resnik notes as distributed Probems, Lecture 6 1 Consider

More information

Unconditional security of differential phase shift quantum key distribution

Unconditional security of differential phase shift quantum key distribution Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice

More information

Haar Decomposition and Reconstruction Algorithms

Haar Decomposition and Reconstruction Algorithms Jim Lambers MAT 773 Fa Semester 018-19 Lecture 15 and 16 Notes These notes correspond to Sections 4.3 and 4.4 in the text. Haar Decomposition and Reconstruction Agorithms Decomposition Suppose we approximate

More information

Coded Caching for Files with Distinct File Sizes

Coded Caching for Files with Distinct File Sizes Coded Caching for Fies with Distinct Fie Sizes Jinbei Zhang iaojun Lin Chih-Chun Wang inbing Wang Department of Eectronic Engineering Shanghai Jiao ong University China Schoo of Eectrica and Computer Engineering

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

Estimating the Power Spectrum of the Cosmic Microwave Background

Estimating the Power Spectrum of the Cosmic Microwave Background Estimating the Power Spectrum of the Cosmic Microwave Background J. R. Bond 1,A.H.Jaffe 2,andL.Knox 1 1 Canadian Institute for Theoretica Astrophysics, Toronto, O M5S 3H8, CAADA 2 Center for Partice Astrophysics,

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees Improving the Accuracy of Booean Tomography by Expoiting Path Congestion Degrees Zhiyong Zhang, Gaoei Fei, Fucai Yu, Guangmin Hu Schoo of Communication and Information Engineering, University of Eectronic

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

Throughput Optimal Scheduling for Wireless Downlinks with Reconfiguration Delay

Throughput Optimal Scheduling for Wireless Downlinks with Reconfiguration Delay Throughput Optima Scheduing for Wireess Downinks with Reconfiguration Deay Vineeth Baa Sukumaran vineethbs@gmai.com Department of Avionics Indian Institute of Space Science and Technoogy. Abstract We consider

More information

MONTE CARLO SIMULATIONS

MONTE CARLO SIMULATIONS MONTE CARLO SIMULATIONS Current physics research 1) Theoretica 2) Experimenta 3) Computationa Monte Caro (MC) Method (1953) used to study 1) Discrete spin systems 2) Fuids 3) Poymers, membranes, soft matter

More information

Physics 127c: Statistical Mechanics. Fermi Liquid Theory: Collective Modes. Boltzmann Equation. The quasiparticle energy including interactions

Physics 127c: Statistical Mechanics. Fermi Liquid Theory: Collective Modes. Boltzmann Equation. The quasiparticle energy including interactions Physics 27c: Statistica Mechanics Fermi Liquid Theory: Coective Modes Botzmann Equation The quasipartice energy incuding interactions ε p,σ = ε p + f(p, p ; σ, σ )δn p,σ, () p,σ with ε p ε F + v F (p p

More information

CS 331: Artificial Intelligence Propositional Logic 2. Review of Last Time

CS 331: Artificial Intelligence Propositional Logic 2. Review of Last Time CS 33 Artificia Inteigence Propositiona Logic 2 Review of Last Time = means ogicay foows - i means can be derived from If your inference agorithm derives ony things that foow ogicay from the KB, the inference

More information

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008 Random Booean Networks Barbara Drosse Institute of Condensed Matter Physics, Darmstadt University of Technoogy, Hochschustraße 6, 64289 Darmstadt, Germany (Dated: June 27) arxiv:76.335v2 [cond-mat.stat-mech]

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled.

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled. imuation of the acoustic fied produced by cavities using the Boundary Eement Rayeigh Integra Method () and its appication to a horn oudspeaer. tephen Kirup East Lancashire Institute, Due treet, Bacburn,

More information

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.)

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.) October 9, 00 IE 6 Exam Prof. Vardeman. The viscosity of paint is measured with a "viscometer" in units of "Krebs." First, a standard iquid of "known" viscosity *# Krebs is tested with a company viscometer

More information

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Physics 8.07: Eectromagnetism II October 7, 202 Prof. Aan Guth LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL

More information

Backward Monte Carlo Simulations in Radiative Heat Transfer

Backward Monte Carlo Simulations in Radiative Heat Transfer Backward Monte Caro Simuations in Radiative Heat Transfer Michae F. Modest Department of Mechanica and Nucear Engineering Penn State University University Park, PA 82 emai: mfm@psu.edu August 29, 2 Abstract

More information

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE KATIE L. MAY AND MELISSA A. MITCHELL Abstract. We show how to identify the minima path network connecting three fixed points on

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

Higher dimensional PDEs and multidimensional eigenvalue problems

Higher dimensional PDEs and multidimensional eigenvalue problems Higher dimensiona PEs and mutidimensiona eigenvaue probems 1 Probems with three independent variabes Consider the prototypica equations u t = u (iffusion) u tt = u (W ave) u zz = u (Lapace) where u = u

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

c 2016 Georgios Rovatsos

c 2016 Georgios Rovatsos c 2016 Georgios Rovatsos QUICKEST CHANGE DETECTION WITH APPLICATIONS TO LINE OUTAGE DETECTION BY GEORGIOS ROVATSOS THESIS Submitted in partia fufiment of the requirements for the degree of Master of Science

More information

PHYS 110B - HW #1 Fall 2005, Solutions by David Pace Equations referenced as Eq. # are from Griffiths Problem statements are paraphrased

PHYS 110B - HW #1 Fall 2005, Solutions by David Pace Equations referenced as Eq. # are from Griffiths Problem statements are paraphrased PHYS 110B - HW #1 Fa 2005, Soutions by David Pace Equations referenced as Eq. # are from Griffiths Probem statements are paraphrased [1.] Probem 6.8 from Griffiths A ong cyinder has radius R and a magnetization

More information

1D Heat Propagation Problems

1D Heat Propagation Problems Chapter 1 1D Heat Propagation Probems If the ambient space of the heat conduction has ony one dimension, the Fourier equation reduces to the foowing for an homogeneous body cρ T t = T λ 2 + Q, 1.1) x2

More information

Integrality ratio for Group Steiner Trees and Directed Steiner Trees

Integrality ratio for Group Steiner Trees and Directed Steiner Trees Integraity ratio for Group Steiner Trees and Directed Steiner Trees Eran Haperin Guy Kortsarz Robert Krauthgamer Aravind Srinivasan Nan Wang Abstract The natura reaxation for the Group Steiner Tree probem,

More information

Problem Set 6: Solutions

Problem Set 6: Solutions University of Aabama Department of Physics and Astronomy PH 102 / LeCair Summer II 2010 Probem Set 6: Soutions 1. A conducting rectanguar oop of mass M, resistance R, and dimensions w by fas from rest

More information

Copyright information to be inserted by the Publishers. Unsplitting BGK-type Schemes for the Shallow. Water Equations KUN XU

Copyright information to be inserted by the Publishers. Unsplitting BGK-type Schemes for the Shallow. Water Equations KUN XU Copyright information to be inserted by the Pubishers Unspitting BGK-type Schemes for the Shaow Water Equations KUN XU Mathematics Department, Hong Kong University of Science and Technoogy, Cear Water

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Pattern Frequency Sequences and Internal Zeros

Pattern Frequency Sequences and Internal Zeros Advances in Appied Mathematics 28, 395 420 (2002 doi:10.1006/aama.2001.0789, avaiabe onine at http://www.ideaibrary.com on Pattern Frequency Sequences and Interna Zeros Mikós Bóna Department of Mathematics,

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

arxiv: v1 [math.ca] 6 Mar 2017

arxiv: v1 [math.ca] 6 Mar 2017 Indefinite Integras of Spherica Besse Functions MIT-CTP/487 arxiv:703.0648v [math.ca] 6 Mar 07 Joyon K. Boomfied,, Stephen H. P. Face,, and Zander Moss, Center for Theoretica Physics, Laboratory for Nucear

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

BDD-Based Analysis of Gapped q-gram Filters

BDD-Based Analysis of Gapped q-gram Filters BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de

More information

arxiv:math/ v2 [math.pr] 6 Mar 2005

arxiv:math/ v2 [math.pr] 6 Mar 2005 ASYMPTOTIC BEHAVIOR OF RANDOM HEAPS arxiv:math/0407286v2 [math.pr] 6 Mar 2005 J. BEN HOUGH Abstract. We consider a random wa W n on the ocay free group or equivaenty a signed random heap) with m generators

More information

David Eigen. MA112 Final Paper. May 10, 2002

David Eigen. MA112 Final Paper. May 10, 2002 David Eigen MA112 Fina Paper May 1, 22 The Schrodinger equation describes the position of an eectron as a wave. The wave function Ψ(t, x is interpreted as a probabiity density for the position of the eectron.

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information