On the length of the longest exact position. match in a random sequence
|
|
- Neil Holmes
- 5 years ago
- Views:
Transcription
1 On the length of the longest exact position match in a random sequence G. Reinert and Michael S. Waterman Abstract A mixed Poisson approximation and a Poisson approximation for the length of the longest exact match of a random sequence across another sequence are provided, where the match is required to start at position 1 in the first sequence. This problem arises when looking for suitable anchors in whole genome alignments. AMS 2000 subject classifications. 62E17, 92D20. Key words and phrases: Poisson approximation, mixed Poisson approximation, length of longest match, Chen-Stein method Corresponding author. Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK; reinert@stats.ox.ac.uk; phone ; fax GR was supported in part by EPSRC grant no. GR/R52183/01. Molecular and Computational Biology, University of Southern California, 835 W 37th Street, SHS 172, Los Angeles, California , USA; msw@usc.edu; phone ; fax MSW was supported by NIH grant no. P50 HG
2 1 Introduction When aligning whole genomes, often a seed-and-extend technique is used. Starting from exact or near-exact matches, reliable ones among these matches are selected as anchors, and then the remaining stretches are filled in using local and global alignment. See Lippert et al. [1] for a discussion of genome alignment methods using anchors. To select a match that is both sensitive and specific, [1] introduce a score based on the length, R n, of the longest exact match of a random sequence across another sequence, where shifts are not allowed. For R n and the associated scores, [1] find that their approach based on a mixed Poisson approximation, although valid, is computationally not feasible if the distribution of the random letters making up the random sequences is not uniform, as the mixing takes place over too many terms; the authors resort to a Monte Carlo method. Here we provide a Poisson approximation for the number of matches of fixed length, along with bounds provided by the Chen-Stein method, and we obtain an approximate expression for the cumulative distribution function of R n that is easy to compute. The bound on the error in the approximation turns out to be small, thus making our suggestion a useful approach. The set-up for our problem is as follows. Let A = A 1 A 2... A n and B = B 1 B 2... B n be two independent sequences with i.i.d. letters from a finite alphabet A with d elements. Let π(a) be the probability that a random letter takes on the letter a, and let π = max a A π(a) be the maximum of these probabilities. The letter distribution is not necessarily uniform. We put R n = max {A k = B j+k, k = 1,..., m, for some 0 j n m}; m thus R n denotes the length of the longest exact match of a random sequence across another 2
3 sequence, where shifts are not allowed. Note that if the match in sequence A was not required to start at position 1, the problem would reduce to the distribution of the well understood H n = max {A i+k = B j+k, k = 1,..., m, for some 0 i, j n m}, m see Waterman [3]. Our problem differs from the study of H n by requiring an exact match beginning at a fixed position in the first sequence. To reveal the Poisson-type structure in the problem, we use a standard duality argument as follows. If R n < m then there are no matches of length m (or longer) in the sequence. Ignoring end effects, this means that there are no occurrences of A 1... A m in B. Let W m denote the number of (clumps of) matches of length m (or longer) in the sequence, so that P (R n < m) P (W m = 0). In Section 2 we shall give a mixed Poisson approximation for P (W m = 0). Section 3 derives the Poisson approximation for P (W m = 0) and applies it to obtain an approximation, with bound, for P (R n < m). Finally, in Section 4 we illustrate that the approximation for P (R n < m) is indeed easily computatble. 2 A mixed Poisson approximation For Poisson and mixed Poisson approximation it is useful to think in terms of clumps of occurrences, see Reinert et al. [2], because de-clumping disentangles the dependence arising from self-overlap of words. We say that a clump of a word = m starts at position i in B if there is an occurrence of at position i, and there is no (overlapping) occurrence of at positions i m + 1,..., i 1. 3
4 Thus when ignoring end effects the study of R n is equivalent to the study of W m = n i=1 1(a clump of A 1... A m starts at position i in B), where we abbreviate n = n m + 1. End effects only arise from the possibility that, when embedded in an infinite sequence, the sequence B = B 1 B 2... B n starts within a clump in the infinite sequence. Assume that B = B 1 B 0 B 1 B n B n+1 is an infinite sequence for now, so that we can ignore end effects. Then we have R n < m W m = 0. If m is large enough, then a fixed word of length m will rarely occur at a given position i in the random sequence B. When using clumps in order to account for the strong dependence between neighbouring occurrences in the case that has a large amount of self-overlap, it is plausible and indeed established that the number of clumps of in B is approximately Poisson distributed, Proposition 1 below. For any fixed, we let n W m () = 1(a clump of starts at position i in B). i=1 In what follows we shall always assume that = w 1 w m A m, so that m µ() = π(w i ) i=1 is the probability of a random word of length m equals. If there is a p such that w i = w i+p, i = 1,..., m p, then p is called a period of. A period is a principal period if it is not a strict multiple of the minimal period. An occurrence of starting at postion i is a clump if and only if for none of the periods p of, the truncated word (p) = w 1 w p 4
5 starts at position i p. It is easy to see that it suffices to consider all principal periods. The probability that a clump of starts at a given position in the sequence is then given by µ(w) = µ(w) µ(w (p) w), p P () where (p) = w 1 w p w 1 w m is the concatenated word, and P () is the set of principal periods of. In particular, EW = λ() := n µ(). To describe the distance between the distributions of non-negative integer valued random variables X and Y we use the total variation distance, defined by d T V (X, Y ) = sup P (X B) P (Y B). B {0,1,...} It will be convenient to abbreviate, for r = 1, 2, 3,..., π r = a A(π(a)) r, the probability that r random letters match. Corollary in [2] together with the independence of the letters immediately gives the following proposition. Proposition 1 Let Z() P o( λ()) be Poisson distributed with mean λ(). Then d TV ( L(Wm ()), P o( λ())) ) (n m + 1) µ(){(6m 5) µ() + 2(m 1)µ()}. Proposition 1 only counts the number of occurrences of a fixed word, whereas in our problem, the first m letters of the sequence A, namely A 1... A m, constitute a random word. Thus we need to condition on the words that A 1... A m take on, and using the rule of total probability, we obtain a mixed Poisson approximation. 5
6 Theorem 1 With the above notation, P (W m = 0) µ()p (P o( λ()) = 0) Rem 1 (8m 7) nπ m 3. Remark 1 Recalling that π = max a A π(a), we note that λ() n(π ) m. If we consider the regime that n(π ) m is approximately constant, with m fixed, then (π ) m = O(n 1 ), and using the bound π 3 (π ) 2 a A π(a) = (π ) 2, we obtain that Rem 1 = O(n 1 ), thus indicating that the bound in Theorem 1 is of useful order. Proof of Theorem 1. Writing out the different sequences that A 1 A 2... A m can take on, we have P (W m = 0) = µ()p (W m () = 0) = µ()p (P o( λ()) = 0) + µ()ɛ 1 (), where, by Proposition 1, µ()ɛ 1 n µ() µ(){(6m 5) µ() + 2(m 1)µ()} =: Rem 1. For Rem 1 we use that µ() µ() to bound Rem 1 (8m 7) n (µ()) 3. Now, if A 1 A m, B 1 B m, and C 1 C m are three independent random words, then (µ()) 3 = P (A 1 A m = )P (B 1 B m = )P (C 1 C m = ) = P (A 1 A m = B 1 B m = C 1 C m ) = (π 3 ) m using that the letters are independent, so that Rem 1 (8m 7) n(π 3 ) m, as claimed. 6
7 3 Poisson approximation to the mixed Poisson approximation Although Theorem 1 is valid, the probability µ()p (P o( λ()) = 0) is difficult to evaluate, the sum growing exponentially with alphabet size. As much of the computational difficulty lies in accounting for the different periods in all words A m, our idea is to approximate P (P o( λ()) = 0) by the simpler expression P (P o(λ()) = 0), where λ() := nµ(). Thus we ignore the period correction in the Poisson parameter. While this may much distort the limiting distribution for words with a large amount of self-overlap, there are not too many such words in A m ; indeed we provide a bound on the error in this approximation in the next theorem. Recall that π = max a A π(a). Theorem 2 For A m, let Z() have Poisson distribution with mean λ(), and let Z() have Poisson distribution with mean λ(). Abbreviate f = π2 2 π 3. Then where µ()p ( Z() = 0) µ()p (Z() = 0) = Rem 2, Rem 2 (1 e n(π ) m ) { (π 2 ) m f m+ m f + m 2 (π ) m }. (1) Here, x denotes the integer part of x. Remark 2 In the regime considered in Remark 1, that n(π ) m is approximately constant it follows that Rem 2 = O(n 1 ), indicating that the bound in Theorem 2 is usable. 7
8 Proof of Theorem 2. Firstly we wish to bound P ( Z() = 0) P (Z() = 0) = e λ() e λ(). By series expansion it is easy to see that for any 0 ν λ <, λ ( e λ ν 1 ) (λ ν) ( e λ 1 ) and direct manipulation yields e ν e λ (λ µ) 1 e λ. λ Applying this bound with λ = λ() and ν = λ(), we have for each fixed that Thus P ( Z() = 0) P (Z() = 0) 1 e λ() λ() n p P () µ(w (p) w). µ()p ( Z() = 0) µ()p (Z() = 0) = Rem 2, where Rem 2 µ() 1 e λ() λ() n p P () µ(w (p) w) m 1 (1 e n(π ) m ) 1(p P ())µ(w (p) w). (2) p=1 In the last step we used the uniform bound µ() (π ) m for all. To bound the sum p P () µ(w (p) w) we consider the cases that p m 2 and p m separately. 8
9 For p m + 1 we note that 2p + 1 m, and writing out the period yields 2 1(p P ())µ(w (p) w) = 1(p P ())P (A 1... A m+p = w (p) w) = 1(p P ())P (A 1... A m+p = w (p) w; A 1 = A p+1 = A 2p+1,..., A m p = A m = A m+p ; A m p+1 = A m+1,..., A p = A 2p ) P (A l = A l+p = A l+2p, l = 1,..., m p; A l = A l+p, l = m p + 1,..., p). But this probability can be expressed by the probability π 3 that three random letters match, and the probability π 2 that two random letters match. We have m p equations forcing the matching of three random letters each, and 2p m equations forcing the matching of two random letters each. As the letters are independent, the probabilities are easy to calculate; Thus with f = π2 2 π 3, which is less or equal to 1, P (A l = A l+p = A l+2p, l = 1,..., m p; A l = A l+p, l = m p + 1,..., p) ( ) ( ) = π m p 3 π 2p m π3 m π 2 p 2 = 2. π 2 π 3 m 1 1(p P ())µ(w (p) w) p= m 2 +1 m 1 p= m 2 +1 π m p 3 π 2p m 2 = (π 2 ) m f m+ m (3) 1 f For p m 2 we note that if a word has period p m 2, then the letters w p+1,..., w m are uniquely determined. Therefore any word can possess at most one prinicpal period p m 2. 9
10 Again spelling out the periodicity, we obtain that m 2 1(p P ())µ(w (p) w) p=1 = m 2 p=1 ( P A i = A i+p =... = A i+( m+p )p, i = 1,..., m(modp); p A j = A j+p =... = A j+( m p )p, j = m(modp) + 1,..., p ) m 2 ( ) p m(mod p) ( ) m(mod p) π m+p p π m+p. (4) p +1 p=1 Expression (4) can be bounded further by using that π r π π r 1 (π ) r 1, giving m 2 p=1 ( π m+p p ) p m(mod p) ( ) m(mod p) π m+p p +1 = m 2 p=1 m 2 p=1 (π ) (π m(mod p) m+p p (π ) m m p (π p m+p p ) p ) p (π ) m m. (5) 2 Summarizing we obtain from (3) and (5) that m 1 p=1 1(p P ())µ(w (p) w) (π 2 ) m f m+ m f + m 2 (π ) m. (6) Substituting in (2) gives the stated result. Remark 3 As π = max a A π(a), the bound µ(w (p) w) (π ) m+p is immediate. Using that any word of length m that has a principal period p m is completely determined by its 2 first p letters, instead of using (4), a quick and dirty bound is m 2 1(p P ())µ(w (p) w) p=1 m 2 p=1 d p (π ) m+p = (π ) m+1 d (π d) m 2 1. π d 1 As a A π a = 1, we have that π d 1. However, if the letter distribution is close to uniform, and if m is relatively large, then the above bound will be small. 10
11 Now we apply our results to the original problem, the cumulative distribution function of R n, the length of the longest exact position match. Corollary 1 For A m, as in Theorem 2 let Z() have Poisson distribution with mean λ(). Then P (R n < m) P (Z() = 0) Rem 3, A m where Rem 3 = Rem 1 + ( 1 + (m 1)(π ) m (1 e n(π ) m ) 1) Rem 2, with Rem 1 given in Theorem 1 and Rem 2 given in Theorem 2. Remark 4 In the regime that n(π ) m is approximately constant, we have already seen in Remark 1 and in Remark 2 that Rem 1 = O(n 1 ) and Rem 2 = O(n 1 ), and so also Rem 3 = O(n 1 ), providing a useful bound. Proof of Corollary 1. In view of Theorem 1 and Theorem 2, all that is required is to bound the end effects, resulting from B having been idealized as just a part of an infinite sequence when it came to counting clumps. To bound the end effects, note that (see, e.g. [2], Equation (6.4.10) ) P {1(R n > m) 1(W m = 0)} (m 1) µ()(µ() µ()) (m 1)(π ) m 1(p P ())µ( (p) ). p We now use (6), giving that P {1(R n > m) 1(W m = 0)} (m 1)(π ) m { (π 2 ) m 1 f m 2 +1 m 11 f 1 + (π ) m m 2 }.
12 Applying Theorem 1 and Theorem 2 finishes the proof. Remark 5 Lippert et al. [1] introduce as Z-score Z i,n = max {A i+k = A j+k, k = 0,..., m 1; 1 i j n}. m This is similar to R n but allows self-overlap. [1] show that the probability P { L i=1 1(Z i,n k)} that the scores Z i,n exceed k consecutively across L positions can be expressed by probabilities involving only R n, so Corollary 1 can be applied to approximate the distribution of the scores. 4 Numerical illustration A counting argument shows that µ()p (Z() = 0) = µ()e λ() is not as difficult to evaluate as µ()p ( Z() = 0), as follows. Let n a () denote the number of times that letter a A appears in. Then, as the letters are independent, µ() = (π(a)) na() a A and hence we obtain the multinomial expression µ()p (Z() = 0) (7) ( ) { } { m = π(a) na exp n } π(a) na. (n (na,a A):na {0,1,...,m}, a, a A) a A a A a A na=m While there does not appear to exist a simplifying expression in general, we note that (7) is a polynomial problem in m; indeed we only need to evaluate O(m d 1 ) summands instead of O(d m ) summands. As we consider d typically much smaller than m, this is a considerable reduction in complexity. 12
13 In particular, if A = {A, C, G, T } and if π A = π T, π C = π G, as may be reasonable to assume when considering both a DNA sequence and its reverse-complement, then denoting the base-pair probabilities by p = 2π A = 1 2π C, the expression (7) simplifies to a binomial expectation, µ()p (Z() = 0) = m k=0 ( ) m p k (1 p) m k exp { n2 m p k (1 p) m k}, k where k stands for the sum n A + n T. Example. Suppose as in [1] that n = , the estimated length of the human genome, NCBI build 28 and build 34, with alphabet A = {A, C, G, T } of size d = 4, and basecomposition in a non-repeat region estimated as p A = p T = 0.29 and p C = p G = 0.21, so that π = 0.29 and p = 2p A = Then, truncating after the first four digits, π 2 = , π2 2 = , π 3 = , f = , and we can calculate the mean of λ(a 1 A 2 A m ) using that Eλ(A 1 A 2 A m ) = n µ() 2 = n2 m m k=0 ( ) m p 2k (1 p) 2(m k). k Table 1 gives the expected Poisson parameter for m = 15,..., 22. m Eλ Table 1: Expected Poisson parameter Thus for m = 15 we would expect W 15 = 0, and hence R n < 15, with low probability, whereas for m = 22 we would expect W 22 = 0 with high probability, hence R n < 22 with high probability. 13
14 Table 2 below gives a summary of the estimated probability ρ(m) = Am P (Z() = 0) for P (R n m) 1 P (W m = 0) obtained in Corollary 1, for m = 15, 16,..., 22, along with the Monte-Carlo estimates ˆρ(m) from [1]; we note that Table 8 in [1] indeed gives estimates for P (R n m) instead of P (R n < m) as written ibid. We add our bound from Corollary 1 along with the estimated standard deviation terms contributing to our bound; recall that Rem 2 is given in (1). V arˆρ(m) from [1], and the separate remainder m ρ(m) ˆρ(m) bound V arˆρ(m) Rem 1 Rem e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-13 Table 2: Estimated probabilities, bounds, and remainder terms Our approximated probabilities are similar to the Monte-Carlo estimates in [1]. However, whereas [1] can only conclude that, say, an approximate 95% confidence interval for the true probability P (R n m) is given by ˆρ ± 1.96 V arˆρ(m), we indeed proved that the true probability will lie within ρ(m) ± bound, which is a shorter interval for all values of m considered in this example. 14
15 Also we see that both remainder terms Rem 1 and Rem 2 contribute in similar magnitude to the bound Rem 3, indicating that the bound on the error made in replacing the mixed Poisson approximation by the Poisson approximation is not much larger than the bound on the error made by the mixed Poisson approximation in the first place. References [1] Lippert, R.A., Zhao, X., Florea, L., Mobarry, C., and Istrail, S. (2004). Finding Anchors for Genomic Sequence Comparison. In Proceedings of the 8th Annual International Conference on Research in Computational Biology (RECOMB 04), ACM Press, Also in J. Comp. Biol. 12, (2005). [2] Reinert, G., Schbath, S., and Waterman, M.S. (2005). Statistics on words with applications to biological sequences. In Lothaire: Applied Combinatorics on Words J. Berstel and D. Perrin, eds., Cambridge University Press, [3] Waterman, M.S. (1995). Introduction to Computational Biology. Chapman and Hall. 15
The Distribution of Short Word Match Counts Between Markovian Sequences
The Distribution of Short Word Match Counts Between Markovian Sequences Conrad J. Burden 1, Paul Leopardi 1 and Sylvain Forêt 2 1 Mathematical Sciences Institute, Australian National University, Canberra,
More informationPOISSON PROCESSES 1. THE LAW OF SMALL NUMBERS
POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted
More informationKaraliopoulou Margarita 1. Introduction
ESAIM: Probability and Statistics URL: http://www.emath.fr/ps/ Will be set by the publisher ON THE NUMBER OF WORD OCCURRENCES IN A SEMI-MARKOV SEQUENCE OF LETTERS Karaliopoulou Margarita 1 Abstract. Let
More informationDistributional regimes for the number of k-word matches between two random sequences
Distributional regimes for the number of k-word matches between two random sequences Ross A. Lippert, Haiyan Huang, and Michael S. Waterman Informatics Research, Celera Genomics, Rockville, MD 0878; Department
More informationA simple branching process approach to the phase transition in G n,p
A simple branching process approach to the phase transition in G n,p Béla Bollobás Department of Pure Mathematics and Mathematical Statistics Wilberforce Road, Cambridge CB3 0WB, UK b.bollobas@dpmms.cam.ac.uk
More informationComparison of the similarities between two segments of biological sequences using k-tuples (also. Research Articles
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 16, Number 12, 2009 # Mary Ann Liebert, Inc. Pp. 1615 1634 DOI:10.1089=cmb.2009.0198 Research Articles Alignment-Free Sequence Comparison (I): Statistics and Power
More informationarxiv:math/ v1 [math.nt] 9 Feb 1998
ECONOMICAL NUMBERS R.G.E. PINCH arxiv:math/9802046v1 [math.nt] 9 Feb 1998 Abstract. A number n is said to be economical if the prime power factorisation of n can be written with no more digits than n itself.
More informationBounds on the generalised acyclic chromatic numbers of bounded degree graphs
Bounds on the generalised acyclic chromatic numbers of bounded degree graphs Catherine Greenhill 1, Oleg Pikhurko 2 1 School of Mathematics, The University of New South Wales, Sydney NSW Australia 2052,
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationWaiting Time Distributions for Pattern Occurrence in a Constrained Sequence
Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence November 1, 2007 Valeri T. Stefanov Wojciech Szpankowski School of Mathematics and Statistics Department of Computer Science
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 1 2012 Article 3 Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length Conrad J. Burden,
More informationk-protected VERTICES IN BINARY SEARCH TREES
k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from
More informationCOMPOSITIONS WITH A FIXED NUMBER OF INVERSIONS
COMPOSITIONS WITH A FIXED NUMBER OF INVERSIONS A. KNOPFMACHER, M. E. MAYS, AND S. WAGNER Abstract. A composition of the positive integer n is a representation of n as an ordered sum of positive integers
More informationSPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS
SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties
More informationMonte Carlo Simulation
Monte Carlo Simulation 198 Introduction Reliability indices of an actual physical system could be estimated by collecting data on the occurrence of failures and durations of repair. The Monte Carlo method
More informationA misère-play -operator
A misère-play -operator Matthieu Dufour Silvia Heubach Urban Larsson arxiv:1608.06996v1 [math.co] 25 Aug 2016 July 31, 2018 Abstract We study the -operator (Larsson et al, 2011) of impartial vector subtraction
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationSubwords in reverse-complement order - Extended abstract
Subwords in reverse-complement order - Extended abstract Péter L. Erdős A. Rényi Institute of Mathematics Hungarian Academy of Sciences, Budapest, P.O. Box 127, H-164 Hungary elp@renyi.hu Péter Ligeti
More informationA COMPOUND POISSON APPROXIMATION INEQUALITY
J. Appl. Prob. 43, 282 288 (2006) Printed in Israel Applied Probability Trust 2006 A COMPOUND POISSON APPROXIMATION INEQUALITY EROL A. PEKÖZ, Boston University Abstract We give conditions under which the
More informationCombinatorial Batch Codes and Transversal Matroids
Combinatorial Batch Codes and Transversal Matroids Richard A. Brualdi, Kathleen P. Kiernan, Seth A. Meyer, Michael W. Schroeder Department of Mathematics University of Wisconsin Madison, WI 53706 {brualdi,kiernan,smeyer,schroede}@math.wisc.edu
More informationA Graph Polynomial Approach to Primitivity
A Graph Polynomial Approach to Primitivity F. Blanchet-Sadri 1, Michelle Bodnar 2, Nathan Fox 3, and Joe Hidakatsu 2 1 Department of Computer Science, University of North Carolina, P.O. Box 26170, Greensboro,
More informationA PASCAL-LIKE BOUND FOR THE NUMBER OF NECKLACES WITH FIXED DENSITY
A PASCAL-LIKE BOUND FOR THE NUMBER OF NECKLACES WITH FIXED DENSITY I. HECKENBERGER AND J. SAWADA Abstract. A bound resembling Pascal s identity is presented for binary necklaces with fixed density using
More informationIntroduction. Probability and distributions
Introduction. Probability and distributions Joe Felsenstein Genome 560, Spring 2011 Introduction. Probability and distributions p.1/18 Probabilities We will assume you know that Probabilities of mutually
More informationOikos. Appendix 1 and 2. o20751
Oikos o20751 Rosindell, J. and Cornell, S. J. 2013. Universal scaling of species-abundance distributions across multiple scales. Oikos 122: 1101 1111. Appendix 1 and 2 Universal scaling of species-abundance
More informationEXPLICIT DISTRIBUTIONAL RESULTS IN PATTERN FORMATION. By V. T. Stefanov 1 and A. G. Pakes University of Western Australia
The Annals of Applied Probability 1997, Vol. 7, No. 3, 666 678 EXPLICIT DISTRIBUTIONAL RESULTS IN PATTERN FORMATION By V. T. Stefanov 1 and A. G. Pakes University of Western Australia A new and unified
More information1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =
Chapter 5 Sequences and series 5. Sequences Definition 5. (Sequence). A sequence is a function which is defined on the set N of natural numbers. Since such a function is uniquely determined by its values
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families
More informationOptimal primitive sets with restricted primes
Optimal primitive sets with restricted primes arxiv:30.0948v [math.nt] 5 Jan 203 William D. Banks Department of Mathematics University of Missouri Columbia, MO 652 USA bankswd@missouri.edu Greg Martin
More informationTHIS work is motivated by the goal of finding the capacity
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 8, AUGUST 2007 2693 Improved Lower Bounds for the Capacity of i.i.d. Deletion Duplication Channels Eleni Drinea, Member, IEEE, Michael Mitzenmacher,
More informationQuivers of Period 2. Mariya Sardarli Max Wimberley Heyi Zhu. November 26, 2014
Quivers of Period 2 Mariya Sardarli Max Wimberley Heyi Zhu ovember 26, 2014 Abstract A quiver with vertices labeled from 1,..., n is said to have period 2 if the quiver obtained by mutating at 1 and then
More informationPOLYNOMIAL ALGORITHM FOR FIXED POINTS OF NONTRIVIAL MORPHISMS
POLYNOMIAL ALGORITHM FOR FIXED POINTS OF NONTRIVIAL MORPHISMS Abstract. A word w is a fixed point of a nontrival morphism h if w = h(w) and h is not the identity on the alphabet of w. The paper presents
More informationFinding Anchors for Genomic Sequence Comparison ABSTRACT
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 12, Number 6, 2005 Mary Ann Liebert, Inc. Pp. 762 776 Finding Anchors for Genomic Sequence Comparison ROSS A. LIPPERT, 1,4 XIAOYUE ZHAO, 2 LILIANA FLOREA, 1,3 CLARK
More informationSimulation. Version 1.1 c 2010, 2009, 2001 José Fernando Oliveira Maria Antónia Carravilla FEUP
Simulation Version 1.1 c 2010, 2009, 2001 José Fernando Oliveira Maria Antónia Carravilla FEUP Systems - why simulate? System any object or entity on which a particular study is run. Model representation
More informationFermat numbers and integers of the form a k + a l + p α
ACTA ARITHMETICA * (200*) Fermat numbers and integers of the form a k + a l + p α by Yong-Gao Chen (Nanjing), Rui Feng (Nanjing) and Nicolas Templier (Montpellier) 1. Introduction. In 1849, A. de Polignac
More informationA note on the decidability of subword inequalities
A note on the decidability of subword inequalities Szilárd Zsolt Fazekas 1 Robert Mercaş 2 August 17, 2012 Abstract Extending the general undecidability result concerning the absoluteness of inequalities
More informationNucleic Acids Research
Volume 11 Number 198 Nucleic Acids Research Frequencies of restriction sites Michael S.Waterman Departments of Mathematics Biology, University of Southern California, Los Angeles, CA 90089, USA Received
More informationNaive Bayesian classifiers for multinomial features: a theoretical analysis
Naive Bayesian classifiers for multinomial features: a theoretical analysis Ewald van Dyk 1, Etienne Barnard 2 1,2 School of Electrical, Electronic and Computer Engineering, University of North-West, South
More informationOptimal Superprimitivity Testing for Strings
Optimal Superprimitivity Testing for Strings Alberto Apostolico Martin Farach Costas S. Iliopoulos Fibonacci Report 90.7 August 1990 - Revised: March 1991 Abstract A string w covers another string z if
More informationBLAST: Basic Local Alignment Search Tool
.. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. (Rapid) Local Sequence Alignment BLAST BLAST: Basic Local Alignment Search Tool BLAST is a family of rapid approximate local alignment algorithms[2].
More informationFamilies that remain k-sperner even after omitting an element of their ground set
Families that remain k-sperner even after omitting an element of their ground set Balázs Patkós Submitted: Jul 12, 2012; Accepted: Feb 4, 2013; Published: Feb 12, 2013 Mathematics Subject Classifications:
More informationTopic: Primal-Dual Algorithms Date: We finished our discussion of randomized rounding and began talking about LP Duality.
CS787: Advanced Algorithms Scribe: Amanda Burton, Leah Kluegel Lecturer: Shuchi Chawla Topic: Primal-Dual Algorithms Date: 10-17-07 14.1 Last Time We finished our discussion of randomized rounding and
More informationStatistics 150: Spring 2007
Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities
More informationA Note on Poisson Approximation for Independent Geometric Random Variables
International Mathematical Forum, 4, 2009, no, 53-535 A Note on Poisson Approximation for Independent Geometric Random Variables K Teerapabolarn Department of Mathematics, Faculty of Science Burapha University,
More informationCombinatorial Optimization
Combinatorial Optimization 2017-2018 1 Maximum matching on bipartite graphs Given a graph G = (V, E), find a maximum cardinal matching. 1.1 Direct algorithms Theorem 1.1 (Petersen, 1891) A matching M is
More informationDescriptional Complexity of Formal Systems (Draft) Deadline for submissions: April 25, 2010 Final versions: July 8, 2010
DCFS 2010 Descriptional Complexity of Formal Systems (Draft) Deadline for submissions: April 25, 2010 Final versions: July 8, 2010 On the Complexity of the Evaluation of Transient Extensions of Boolean
More informationReducing storage requirements for biological sequence comparison
Bioinformatics Advance Access published July 15, 2004 Bioinfor matics Oxford University Press 2004; all rights reserved. Reducing storage requirements for biological sequence comparison Michael Roberts,
More informationNote Watson Crick D0L systems with regular triggers
Theoretical Computer Science 259 (2001) 689 698 www.elsevier.com/locate/tcs Note Watson Crick D0L systems with regular triggers Juha Honkala a; ;1, Arto Salomaa b a Department of Mathematics, University
More informationChapter One. The Real Number System
Chapter One. The Real Number System We shall give a quick introduction to the real number system. It is imperative that we know how the set of real numbers behaves in the way that its completeness and
More informationTHE LECTURE HALL PARALLELEPIPED
THE LECTURE HALL PARALLELEPIPED FU LIU AND RICHARD P. STANLEY Abstract. The s-lecture hall polytopes P s are a class of integer polytopes defined by Savage and Schuster which are closely related to the
More informationLecture 4: September 19
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationRMT 2014 Power Round Solutions February 15, 2014
Introduction This Power Round develops the many and varied properties of the Thue-Morse sequence, an infinite sequence of 0s and 1s which starts 0, 1, 1, 0, 1, 0, 0, 1,... and appears in a remarkable number
More informationPoisson Approximation for Independent Geometric Random Variables
International Mathematical Forum, 2, 2007, no. 65, 3211-3218 Poisson Approximation for Independent Geometric Random Variables K. Teerapabolarn 1 and P. Wongasem Department of Mathematics, Faculty of Science
More informationarxiv: v1 [math.ds] 1 Oct 2014
A NOTE ON THE POINTS WITH DENSE ORBIT UNDER AND MAPS arxiv:1410.019v1 [math.ds] 1 Oct 014 AUTHOR ONE, AUTHOR TWO, AND AUTHOR THREE Abstract. It was conjectured by Furstenberg that for any x [0,1]\Q, dim
More informationGames derived from a generalized Thue-Morse word
Games derived from a generalized Thue-Morse word Aviezri S. Fraenkel, Dept. of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel; fraenkel@wisdom.weizmann.ac.il
More informationElementary divisors of Cartan matrices for symmetric groups
Elementary divisors of Cartan matrices for symmetric groups By Katsuhiro UNO and Hiro-Fumi YAMADA Abstract In this paper, we give an easy description of the elementary divisors of the Cartan matrices for
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationCombinatorial Proof of the Hot Spot Theorem
Combinatorial Proof of the Hot Spot Theorem Ernie Croot May 30, 2006 1 Introduction A problem which has perplexed mathematicians for a long time, is to decide whether the digits of π are random-looking,
More informationComplex Analysis Slide 9: Power Series
Complex Analysis Slide 9: Power Series MA201 Mathematics III Department of Mathematics IIT Guwahati August 2015 Complex Analysis Slide 9: Power Series 1 / 37 Learning Outcome of this Lecture We learn Sequence
More informationMath 730 Homework 6. Austin Mohr. October 14, 2009
Math 730 Homework 6 Austin Mohr October 14, 2009 1 Problem 3A2 Proposition 1.1. If A X, then the family τ of all subsets of X which contain A, together with the empty set φ, is a topology on X. Proof.
More informationPROBABILISTIC METHODS FOR A LINEAR REACTION-HYPERBOLIC SYSTEM WITH CONSTANT COEFFICIENTS
The Annals of Applied Probability 1999, Vol. 9, No. 3, 719 731 PROBABILISTIC METHODS FOR A LINEAR REACTION-HYPERBOLIC SYSTEM WITH CONSTANT COEFFICIENTS By Elizabeth A. Brooks National Institute of Environmental
More informationFILE NAME: PYTHAGOREAN_TRIPLES_012-( WED).DOCX AUTHOR: DOUG JONES
FILE NAME: PYTHAGOREAN_TRIPLES_01-(0090107WED.DOCX AUTHOR: DOUG JONES A. BACKGROUND & PURPOSE THE SEARCH FOR PYTHAGOREAN TRIPLES 1. Three positive whole numbers ( a,b,c which are such that a + b = c are
More informationDeviation from mean in sequence comparison with a periodic sequence
Alea 3, 1 29 (2007) Deviation from mean in sequence comparison with a periodic sequence Heinrich Matzinger, Jüri Lember and Clement Durringer Heinrich Matzinger, University of Bielefeld, Postfach 10 01
More informationMinimal basis for connected Markov chain over 3 3 K contingency tables with fixed two-dimensional marginals. Satoshi AOKI and Akimichi TAKEMURA
Minimal basis for connected Markov chain over 3 3 K contingency tables with fixed two-dimensional marginals Satoshi AOKI and Akimichi TAKEMURA Graduate School of Information Science and Technology University
More informationAn Introduction to Combinatorics
Chapter 1 An Introduction to Combinatorics What Is Combinatorics? Combinatorics is the study of how to count things Have you ever counted the number of games teams would play if each team played every
More informationA Questionable Distance-Regular Graph
A Questionable Distance-Regular Graph Rebecca Ross Abstract In this paper, we introduce distance-regular graphs and develop the intersection algebra for these graphs which is based upon its intersection
More informationOn the discrepancy of circular sequences of reals
On the discrepancy of circular sequences of reals Fan Chung Ron Graham Abstract In this paper we study a refined measure of the discrepancy of sequences of real numbers in [0, ] on a circle C of circumference.
More informationRadiological Control Technician Training Fundamental Academic Training Study Guide Phase I
Module 1.01 Basic Mathematics and Algebra Part 4 of 9 Radiological Control Technician Training Fundamental Academic Training Phase I Coordinated and Conducted for the Office of Health, Safety and Security
More informationThe Asymptotic Expansion of a Generalised Mathieu Series
Applied Mathematical Sciences, Vol. 7, 013, no. 15, 609-616 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.1988/ams.013.3949 The Asymptotic Expansion of a Generalised Mathieu Series R. B. Paris School
More informationRandom Feature Maps for Dot Product Kernels Supplementary Material
Random Feature Maps for Dot Product Kernels Supplementary Material Purushottam Kar and Harish Karnick Indian Institute of Technology Kanpur, INDIA {purushot,hk}@cse.iitk.ac.in Abstract This document contains
More informationThe chromatic number of ordered graphs with constrained conflict graphs
AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 69(1 (017, Pages 74 104 The chromatic number of ordered graphs with constrained conflict graphs Maria Axenovich Jonathan Rollin Torsten Ueckerdt Department
More informationPrime Number Theory and the Riemann Zeta-Function
5262589 - Recent Perspectives in Random Matrix Theory and Number Theory Prime Number Theory and the Riemann Zeta-Function D.R. Heath-Brown Primes An integer p N is said to be prime if p and there is no
More informationADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS
ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS 1. Cardinal number of a set The cardinal number (or simply cardinal) of a set is a generalization of the concept of the number of elements
More informationZero-sum square matrices
Zero-sum square matrices Paul Balister Yair Caro Cecil Rousseau Raphael Yuster Abstract Let A be a matrix over the integers, and let p be a positive integer. A submatrix B of A is zero-sum mod p if the
More informationRefining the Central Limit Theorem Approximation via Extreme Value Theory
Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of
More informationBLAST: Target frequencies and information content Dannie Durand
Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences
More informationRegular finite Markov chains with interval probabilities
5th International Symposium on Imprecise Probability: Theories and Applications, Prague, Czech Republic, 2007 Regular finite Markov chains with interval probabilities Damjan Škulj Faculty of Social Sciences
More information1 Inverse Transform Method and some alternative algorithms
Copyright c 2016 by Karl Sigman 1 Inverse Transform Method and some alternative algorithms Assuming our computer can hand us, upon demand, iid copies of rvs that are uniformly distributed on (0, 1), it
More informationGödel s Incompleteness Theorems
Seminar Report Gödel s Incompleteness Theorems Ahmet Aspir Mark Nardi 28.02.2018 Supervisor: Dr. Georg Moser Abstract Gödel s incompleteness theorems are very fundamental for mathematics and computational
More informationThe concentration of the chromatic number of random graphs
The concentration of the chromatic number of random graphs Noga Alon Michael Krivelevich Abstract We prove that for every constant δ > 0 the chromatic number of the random graph G(n, p) with p = n 1/2
More informationSome hard families of parameterised counting problems
Some hard families of parameterised counting problems Mark Jerrum and Kitty Meeks School of Mathematical Sciences, Queen Mary University of London {m.jerrum,k.meeks}@qmul.ac.uk September 2014 Abstract
More informationTAYLOR AND MACLAURIN SERIES
TAYLOR AND MACLAURIN SERIES. Introduction Last time, we were able to represent a certain restricted class of functions as power series. This leads us to the question: can we represent more general functions
More information1 The linear algebra of linear programs (March 15 and 22, 2015)
1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real
More informationNORMAL GROWTH OF LARGE GROUPS
NORMAL GROWTH OF LARGE GROUPS Thomas W. Müller and Jan-Christoph Puchta Abstract. For a finitely generated group Γ, denote with s n(γ) the number of normal subgroups of index n. A. Lubotzky proved that
More informationMargarita Karaliopoulou 1. Introduction
ESAIM: PS July 2009, Vol. 13, p. 328 342 DOI: 10.1051/ps:2008009 ESAIM: Probability and Statistics www.esaim-ps.org ON THE NUMBER OF WORD OCCURRENCES IN A SEMI-MARKOV SEQUENCE OF LETTERS Margarita Karaliopoulou
More informationModel theory of bounded arithmetic with applications to independence results. Morteza Moniri
Model theory of bounded arithmetic with applications to independence results Morteza Moniri Abstract In this paper we apply some new and some old methods in order to construct classical and intuitionistic
More informationConfidence and prediction intervals for. generalised linear accident models
Confidence and prediction intervals for generalised linear accident models G.R. Wood September 8, 2004 Department of Statistics, Macquarie University, NSW 2109, Australia E-mail address: gwood@efs.mq.edu.au
More informationEquality of P-partition Generating Functions
Bucknell University Bucknell Digital Commons Honors Theses Student Theses 2011 Equality of P-partition Generating Functions Ryan Ward Bucknell University Follow this and additional works at: https://digitalcommons.bucknell.edu/honors_theses
More informationarxiv: v1 [math.nt] 8 Jan 2014
A NOTE ON p-adic VALUATIONS OF THE SCHENKER SUMS PIOTR MISKA arxiv:1401.1717v1 [math.nt] 8 Jan 2014 Abstract. A prime number p is called a Schenker prime if there exists such n N + that p nandp a n, wherea
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationOn avoidability of formulas with reversal
arxiv:1703.10522v1 [math.co] 30 Mar 2017 On avoidability of formulas with reversal James Currie, Lucas Mol, and Narad ampersad Abstract While a characterization of unavoidable formulas (without reversal)
More informationOn the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method
On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,
More informationJournal of Inequalities in Pure and Applied Mathematics
Journal of Inequalities in Pure and Applied Mathematics ASYMPTOTIC EXPANSION OF THE EQUIPOISE CURVE OF A POLYNOMIAL INEQUALITY ROGER B. EGGLETON AND WILLIAM P. GALVIN Department of Mathematics Illinois
More informationSquare-free words with square-free self-shuffles
Square-free words with square-free self-shuffles James D. Currie & Kalle Saari Department of Mathematics and Statistics University of Winnipeg 515 Portage Avenue Winnipeg, MB R3B 2E9, Canada j.currie@uwinnipeg.ca,
More informationIsomorphisms between pattern classes
Journal of Combinatorics olume 0, Number 0, 1 8, 0000 Isomorphisms between pattern classes M. H. Albert, M. D. Atkinson and Anders Claesson Isomorphisms φ : A B between pattern classes are considered.
More informationA Search for Large Twin Prime Pairs. By R. E. Crandall and M. A. Penk. Abstract. Two methods are discussed for finding large integers m such that m I
MATHEMATICS OF COMPUTATION, VOLUME 33, NUMBER 145 JANUARY 1979, PAGES 383-388 A Search for Large Twin Prime Pairs By R. E. Crandall and M. A. Penk Abstract. Two methods are discussed for finding large
More informationA process capability index for discrete processes
Journal of Statistical Computation and Simulation Vol. 75, No. 3, March 2005, 175 187 A process capability index for discrete processes MICHAEL PERAKIS and EVDOKIA XEKALAKI* Department of Statistics, Athens
More informationMaximizing the number of independent sets of a fixed size
Maximizing the number of independent sets of a fixed size Wenying Gan Po-Shen Loh Benny Sudakov Abstract Let i t (G be the number of independent sets of size t in a graph G. Engbers and Galvin asked how
More informationLeast singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)
Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix
More informationStochastic Hybrid Systems: Applications to Communication Networks
research supported by NSF Stochastic Hybrid Systems: Applications to Communication Networks João P. Hespanha Center for Control Engineering and Computation University of California at Santa Barbara Talk
More information