PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED DNA SEQUENCES TO A REFERENCE GENOME

Size: px
Start display at page:

Download "PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED DNA SEQUENCES TO A REFERENCE GENOME"

Transcription

1 International Journal of Foundations of Comuter Science c World Scientific Publishing Comany PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED DNA SEQUENCES TO A REFERENCE GENOME COSTAS S. ILIOPOULOS Det. of Comuter Science, King s College London, London WC2R 2LS, UK & Digital Ecosystems & Business Intelligence Institute, Curtin University, GPO Box U1987 Perth WA 6845, Australia csi@dcs.kcl.ac.uk MIRKA MILLER School of Electrical Engineering and Comuter Science, The University of Newcastle, Callaghan NSW 2308, Australia & Det. of Mathematics, University of West Bohemia, Pilsen, Czech Reublic& Det. of Comuter Science, King s College London, London WC2R 2LS, UK mirka.miller@newcastle.edu.au SOLON P. PISSIS Det. of Comuter Science, King s College London, London WC2R 2LS, UK solon.issis@kcl.ac.uk Received (Day Month Year) Acceted (Day Month Year) Communicated by (xxxxxxxxxx) One of the most ambitious trends in current biomedical research is the large-scale genomic sequencing of atients. Novel high-throughut (or next-generation) sequencing technologies have redefined the way genome sequencing is erformed. They are able to roduce millions of short sequences (reads) in a single exeriment, and with a much lower cost than reviously ossible. Due to this massive amount of data, efficient algorithms for maing these sequences to a reference genome are in great demand, and recently, there has been amle work for ublishing such algorithms. One imortant feature of these algorithms is the suort of multithreaded arallel comuting in order to seedu the maing rocess. In this aer, we design arallel algorithms, which make use of the message-assing arallelism model, to address this roblem efficiently. The roosed algorithms also take into consideration the robability scores assigned to each base for occurring in a secific osition of a sequence. In articular, we resent arallel algorithms for maing short degenerate and weighted DNA sequences to a reference genome. Keywords: arallel algorithms; string algorithms; next-generation sequencing Mathematics Subject Classification: 22E46, 53C35, 57S20 1

2 2 COSTAS S. ILIOPOULOS, MIRKA MILLER, SOLON P. PISSIS 1. Introduction The traditional Sanger caillary sequencing methods [22, 23], develoed in the mid 70 s, have been the workhorse technology for DNA sequencing, for almost 30 years, and is still the go-to technique for high-quality sequencing. But sequencing technology has come a long way since the time when traditional sequencing techniques required many labs around the world to cooerate for over a decade, in order to sequence the human genome for the first time. Nowadays, high-throughut Sequencing By Synthesis technologies have reduced the task of sequencing a whole genome to a matter of days or even hours, and the cost has decreased by orders of magnitude, making it an accessible exerimental rocedure to many labs [27]. This oened the door for re-sequencing to start becoming a more routine rocedure, as it finds many alications in the detection of genetic variability among individuals. Thus, it can hel us understand the extent of that variability, and also identify secific variants, alternative slicing sites and atterns, eigenetic effects, and relate them to gene regulation and exression, as well as to diseases ([1], [28], [29], [19]). Thus, DNA sequencing is quickly becoming a owerful tool in diagnostic medicine, and eventually ersonalized treatment [27]. The data resulting from a single sequencing exeriment can be quite large, and it is not uncommon to have data from multile exeriments. This trend of increasing availability of sequencing data will continue as rojects even more ambitious than the 1000 Genomes Project [1] start to materialize. According to their resective websites, tyical outut sizes for the three main next-generation sequencing latforms are: over a million 400b-long reads er 10-hour run for the 454/Roche latform [3], u to 300GB er run for the ABI SOLiD latform [2], and u to 500 millions aired-end reads 100b-long for the Illumina GA [4]. In most cases these reads are too short to be directly assembled, esecially in the resence of reetitive regions [18], therefore a reference sequence is usually required. In the case of human genome re-sequencing, the reference genome is aroximately 3Gb-long. However, attemts to directly assemble short reads from simler genomes have begun [25], and a first attemt for human data has also been recently reorted [17]. Maing so many short reads onto such a long reference sequence is a very challenging task that cannot be adequately carried out by traditional search and alignment algorithms [11] like BLAST [5] and FASTA [20], so a broad array of rograms have been ublished to address this task, lacing emhasis on different asects of the challenge. The different algorithms imlement various combinations of innovations and trade-offs, to address comuting seed, system resources requirements, and biological relevance and accuracy of the comuted results. The need for more efficient ways to ma large numbers of short sequences was first acknowledged in 2002 and involved modifying the BLAST [5] algorithm so as to index the reference instead of the queries [11]. But really fast and efficient maing software started with ELAND [8], which is the software bundled in the Illumina GA ieline, and with constant develoment to match the advances of the

3 PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED SEQUENCES 3 Illumina latform, it is still one of the fastest algorithms. MAQ [14] was released as an indeendent alternative to ELAND. It makes use of base-calling qualities and introduced maing qualities, but cannot do gaed alignment, and has an uer limit to the length of reads it can ma. Indexing the reads also otentially imoses a high demand on system resources, limiting the scalability of the method. SOAP [15] indexes the reference for more efficient memory usage and offers some form of gaed alignment, while SeqMa [10] allows more flexibility for gas and substitutions. Bowtie [12], SOAP2 [16] and BWA [13] (the successor of MAQ) make use of the Burrows-Wheeler Transform [7] to index the reference, and are able to achieve very good seed and relatively low memory usage. A number of other tools exist as well (REAL [9], SHRiMP [21], GSNAP [28]), each combining solutions differently and to different extents. Many of these alications show the necessity for a measure of accuracy concerning the maing methods. Accuracy can be quantified in terms of sensitivity and secificity. Possible causes of limitations in the accuracy of these exeriments include sequencing errors, variation between the samle and the reference genome, as well as ambiguities caused by reeats in the reference genome [26]. Therefore, the limitations of the equiment used, or the natural olymorhisms that can be observed between individual samles, can give rise to uncertain sequences. These uncertain sequences are called degenerate or indeterminate. Every entry in a degenerate string is a subset of the given alhabet. Very often, each osition of a sequence is accomanied by robabilities of each symbol occurring in the secific osition. In the case of the high-throughut exeriments, the base-calling qualities, which accomany the raw sequence data, describe the confidence of bases in each read [26]. The sequencing base-calling qualities assign a robability to the four ossible nucleotides for each sequenced base. Bases with low quality are more likely to be sequencing errors. These sequences, where the robability of every symbol s occurrence at every location is given, are called weighted sequences. In this aer, we design arallel algorithms for addressing the roblem of efficiently maing millions of short degenerate and weighted DNA sequences to a reference genome. The roosed arallel algorithms make use of the messageassing aradigm, and resemble the sequential strategies resented in [6] and [9]. Our aroach distributes the genomic sequence among the available rocessors, and rerocesses it, by using word-level arallelism and arallel sorting. Then, we distribute the queries among the available rocessors, and use the igeonhole rincile, binary search, and simle word-level oerations to seedu the maing rocess. The rest of the aer is organised as follows. In Section 2, the basic definitions that are used throughout the aer are resented. Section 3 formally defines the roblems solved in this aer. Section 4 and Section 5 resent the roosed algorithms for exact and aroximate matching, resectively. Finally, we briefly conclude with some future roosals in Section 6.

4 4 COSTAS S. ILIOPOULOS, MIRKA MILLER, SOLON P. PISSIS 2. Preliminaries A string is a sequence of zero or more symbols from an alhabet Σ. In this work, we are considering the finite alhabet Σ for DNA sequences, where Σ = {A, C, G, T }. The length of a string x is denoted by x. The i-th symbol of a string x is denoted by x[i]. A string w is a factor of x if x = uwv, where u, v Σ. We denote by x[i..j] the factor of x that starts at osition i and ends at osition j. In a string x on an alhabet Σ, a osition i is said to be degenerate or indeterminate iff x[i] may be any one of a secified subset {λ 1, λ 2,..., λ j } of Σ, 2 j Σ, and x[i] itself is said to be a degenerate symbol. If x[i] is a one-element subset it is called solid, non-solid otherwise. A string that may contain degenerate symbols is said to be degenerate (or indeterminate). For two strings x and y, such that only y is degenerate and x = y, the Hamming distance δ H (x, y) is the number of laces in which x[i] / y[i]. Formally, δ H (x, y) = x i=1 1 x[i] / y[i], where 1 x[i]/ y[i] = 1, if x[i] / y[i], or 0, otherwise. A weighted string over alhabet Σ is a sequence x = x[1..n] of sets of airs. In articular, each x[i] is a set of airs ((q 1, π i (q 1 )), (q 2, π i (q 2 )),..., (q Σ, π i (q Σ )), where π i (q j ) is the occurrence robability of symbol q j at osition i. A symbol q j occurs at osition i of a weighted sequence x = x[1..n] iff the robability of occurrence of symbol q j at osition i is greater than zero, i.e. π i (q j ) > 0. For every osition 1 i n, Σ j=1 π i(q j ) = 1. For examle, ( A 0.8 C 0.2) is a non-solid symbol, imlying that base A occurs with robability 80% and C with robability 20%. The roosed arallel algorithms make use of the message-assing aradigm, by using rocessing elements. The following assumtions for the model of communications in the arallel comuter are made. The arallel comuter comrises a number of nodes. Each node comrises one or several identical rocessors interconnected by a switched communication network. The time taken to send a message of size n between any two nodes is indeendent of the distance between nodes and can be modeled as t comm = t s + nt w, where t s is the latency or start-u time of the message, and t w is the transfer time er data. The links between two nodes are full-dulex and single-orted: a message can be transferred in both directions by the link at the same time, and only one message can be sent and one message can be received at the same time. 3. Problems definition We denote the generated short sequences as the set 1, 2,..., r, where r is a natural number (r > 10 7 in ractice), and we call them atterns. The length of each attern is currently, tyically between 25 and 75 b long. Without loss of generality, we denote the length of the atterns as l. We assume that the data is derived from high-quality sequencing methods, and therefore we will only consider degenerate or weighted atterns with at most µ l non-solid symbols. The given text is a genomic sequence t = t[1..n], where n > 10 8, and we are also given a ositive threshold k 0.

5 PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED SEQUENCES 5 We formally define the roblem of maing short degenerate and weighted sequences to a reference genome, as follows. Problem 1. Find whether the degenerate attern i = i [1..l], for all 0 i < r, with at most µ non-solid symbols, occurs with at most k-mismatches in text t = t[1..n]. Problem 2. Find whether the weighted attern i = i [1..l], for all 0 i < r, with at most µ non-solid symbols, occurs with at most k-mismatches in text t = t[1..n], with robability at least γ, if l i=1 π i(q j ) γ, q j Σ, 1 j Σ. The case that k > 0 corresonds to the ossibility that the attern either contains a sequencing error, or a small difference between a mutant and the reference genome. 4. Exact Pattern Matching In this section, the focus is to find all the occurrences of a degenerate or a weighted attern i, for all 0 i < r, in text t = t[1..n], with no mismatches, i.e. k = 0. In order for the rocedure to be efficient, each rocessor makes use of word-level arallelism by transforming each factor of length l of t into a signature. We get the signature σ(x) of a string x, by transforming it to its binary equivalent using 2-bits-er-base encoding of the DNA alhabet, and storing its decimal value into a comuter word. For simlicity, we assume that the signature fits in the comuter word, i.e. 2l w, where w is the word size of the machine (e.g. 32 or 64 in ractice). Notice that, in the case that 2l > w, the roosed algorithm can easily be adoted by storing the signatures in 2l/w comuter words. Our aim is to rerocess the text t and create a local list L on each rocessor. List L holds elements of tye e i =(i, σ(z i )), where i reresents the starting osition of factor z i = t[i... i + l 1]. An outline of Algorithm I is as follows. Problem Partitioning. We use a data decomosition aroach to artition the text t with the sliding window mechanism into a set of factors z 1, z 2,..., z n l+1 of length l, where z i = t[i... i + l 1], for all 1 i n l + 1. Ste 1. We assume that text t is stored locally on the master rocessor. We make sure that the load is evenly balanced by distributing the factors of t among the available rocessors. Without loss of generality, we denote as z firstρ,..., z lastρ the set of the allocated factors of rocessor ρ, for all 0 ρ <. Ste 2. Each rocessor ρ, for all 0 ρ <, transforms each allocated factor z i, for all first ρ i last ρ, into a signature σ(z i ), acks it in an element e i =(i, σ(z i )),

6 6 COSTAS S. ILIOPOULOS, MIRKA MILLER, SOLON P. PISSIS and adds e i to a local list Z ρ. As soon as rocessor ρ comutes σ(z firstρ ), then each σ(z i ), for all first ρ + 1 i last ρ, can be retrieved in constant time (using shift -tye oerations). Ste 3. We sort the elements of the local lists Z ρ, for all 0 ρ <, based on the signature s field in arallel, using Parallel Sorting by Regular Samling (PSRS) [24], a ractical arallel sorting algorithm. Notice that arallel sorting means rearranging the elements of the local lists Z ρ, so that each rocessor ρ still has an evenly balanced amount of elements in Z ρ, but the elements are stored in sorted order by rocessor 0, 1,..., 1. Ste 4. We erform a gather oeration, in which the master rocessor collects, the local list Z ρ, from each other rocessor, and stores each local list Z ρ in rank order, resulting in a new combined sorted list L, i.e. σ(x) L[i] σ(y) L[i + 1], for all 1 i n l. The master rocessor erforms a one-to-all broadcast to send L to all other rocessors. Ste 5. We make sure that each rocessor is allocated an evenly balanced amount of query atterns from the set 1, 2,..., r. Assume that we have a degenerate query i [1..l], 1 i r. Let i = i [1] i [2]... i [l] (1) the Cartesian roduct. For each g i i, 1 g Σ µ, we can comute signature σ( g i ). Thus, we extend the set of atterns 1, 2,..., r, to a new set X = { g i : 1 i r, 1 g Σ µ }, where X Σ µ r. In addition to that, if i is a weighted attern (Problem 2), we must check that l i=1 π i(q j ) γ, q j Σ, 1 j Σ, is satisfied. Then, we can easily check whether i occurs in t by using binary search. If there exists a attern g i, such that g i i and σ( g i ) L, then i occurs in t. Theorem 1. Algorithm I solves Problem 1 and Problem 2 for k = 0, in O( n log n + Σ µ r log n) comutation time and O(n log + r) communication time. Proof. In ste 1, assuming that the text t is ket locally on the master rocessor, the distribution of t among the available rocessors can be done in O(t s log + ( 1)) communication time. In ste 2, each rocessor creates an evenly balanced amount of signatures of the factors of t in O( n ) comutation time. In ste 3, the PSRS algorithm can be executed in O( n log n ) comutation time, where n 3, and O(n/ ) communication time [24]. In ste 4, the gather oeration can be done n in O(t s log + t w ( 1)), and the one-to-all broadcast takes O((t s + t w n) log ) t w n communication time. Ste 5 runs in O( Σ µ r search, and O(t s log + t w r log n) comutation time, for the binary ( 1)) communication time, for the distribution of the atterns. Hence, asymtotically, the overall time is O( n log n + Σ µ r log n)

7 PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED SEQUENCES 7 comutation time, which is O( n log n + r log n) in ractice, and O(n log + r) communication time. 5. Aroximate Pattern Matching In this section, the focus is to find the occurrences of i, for all 0 i < r, in text t = t[1..n] with at most k-mismatches, i.e. k 0. Here, the idea of using the igeonhole rincile to slit each signature into ν fragments is adoted. By requiring some of the ν fragments (instead of all of them) to be erfectly matched, the noncandidates can be filtered out very quickly. For examle, to admit two mismatches, a attern can be slit into four fragments. The two mismatches can exist in at most two of the fragments (at the same time). Then, if we try all six combinations of the two fragments as the seed, we can catch all hits with two mismatches. Lemma 2. Given the number of fragments ν of a string x = {x 1, x 2,..., x ν }, and the number of allowed mismatches k, k < ν, any of the k mismatches cannot exist, at the same time, in at least ν k fragments of x. Proof. Immediate from the igeonhole rincile. Without loss of generality, we choose ν, such that ν k = k and 2l/ν w, where w is the size of the comuter word. We denote as c j (σ(x)) = {σ(x a 1 ), σ(x a 2 ),..., σ(x a ν k )}, with a 1 < a 2 <... < a ν k, the ( ν combinations of σ(x) = {σ(x 1 ), σ(x 2 ),..., σ(x ν )}, such that if c j+1 = {σ(x b 1 ), σ(x b 2 ),..., σ(x b ν k )}, then ν k i=1 a i ν k i=1 b i, for all 1 j < ( ν. We denote as d q (σ(x)) = {σ(x a1 ), σ(x a2 ),..., σ(x a k )}, with a 1 < a 2 <... < a k, the rest k fragments of σ(x), such that if d q+1 = {σ(x b1 ), σ(x b2 ),..., σ(x b k )}, then k i=1 a i k i=1 b i, for all 1 q < ( ν. Our aim is to rerocess the text t and construct ( ν local lists Lj, for all 1 j ( ν on each rocessor. Each list Lj holds elements of tye e j i = (i, sj i, nextj i ), where i reresents the starting osition of factor z i = t[i... i + l 1], s j i is the concatenated fragments of c j (σ(z i )), and next j i oints to eq i = (i, sq i, nextq i ), where s q i the rest concatenated fragments of d q(σ(z i )) of σ(z i ), in L q. We define the following oerations: f(j): a function that given j, it returns q, such that if c j (σ(x)) = {σ(x a1 ), σ(x a2 ),..., σ(x a ν k )} and d q (σ(x)) = {σ(x b 1 ), σ(x b 2 ),..., σ(x b k )}, then

8 8 COSTAS S. ILIOPOULOS, MIRKA MILLER, SOLON P. PISSIS c j d q = {σ(x 1 ), σ(x 2 ),..., σ(x ν )}, for some string x. h(c i (σ(x))): a function that given c i (σ(x)), it returns s, the concatenated fragments of c i (σ(x)), for some string x. bs(s, L j ): a binary search oeration that given a signature s and the list L j, it returns {e j a 1, e j a 2,..., e j a v } such that s e j a i, for all 1 i v. bito(σ(x), σ(y)): a word level oeration that given two strings x and y, with x = y = α and 2α w, where w is the size of the comuter word, it returns δ H (x, y), in constant time. An outline of Algorithm II is as follows. Problem Partitioning. We use a data decomosition aroach to artition the text t with the sliding window mechanism into a set of factors z 1, z 2,..., z n l+1 of length l, where z i = t[i... i + l 1], for all 1 i n l + 1. Ste 1. We assume that text t is stored locally on the master rocessor. We make sure that the load is evenly balanced by distributing the factors of t among the available rocessors. Without loss of generality, we denote as z firstρ,..., z lastρ the set of the allocated factors of rocessor ρ, for all 0 ρ <. Ste 2. Each rocessor ρ, for all 0 ρ <, comutes σ(z i ), for all first ρ i last ρ. Then, it comutes c j (σ(z i )), q = f(j), d q (σ(z i )), for all 1 j ( ν /2, and add e j i = (i, h(c j(σ(z i )), i) and e q i = (i, h(d q(σ(z i )), i) to the local lists Zρ j and Zρ, q resectively. In this way, we construct ( ) ν ν k local lists Z j ρ, for all 1 j ( ν, on each rocessor ρ, for all 0 ρ <. Ste 3. We sort the elements of the local lists Z j ρ, for all 1 j ( ν, for all 0 ρ <, based on the signature s field, in arallel, using PSRS, ensuring that, in the case that we swa elements, we reserve that next j i still oints to eq i and nextq i oints to e j i. Ste 4. We erform a gather oeration, in which the master rocessor collects the local lists Zρ, j for all 1 j ( ν, from each other rocessor, and stores each local list Zρ j in rank order, resulting in a new combined sorted list L j, for all 1 j ( ν. The master rocessor erforms a one-to-all broadcast to send Lj to all other rocessors. Ste 5. We make sure that each rocessor is allocated an evenly balanced amount of query atterns from the set 1, 2,..., r. Assume that we have a degenerate query i [1..l], 1 i r. Let i = i [1] i [2]... i [l] (2)

9 PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED SEQUENCES 9 the Cartesian roduct. For each g i i, 1 g Σ µ, we can comute signature σ( g i ). Thus, we extend the set of atterns 1, 2,..., r, to a new set X = { g i : 1 i r, 1 g Σ µ }, where X Σ µ r. In addition to that, if i is a weighted attern (Problem 2), we must check that l i=1 π i(q j ) γ, q j Σ, 1 j Σ, is satisfied. We comute σ( g i ), c j(σ( g i )), q = f(j) and d q(σ( g i )), for all 1 j ( ν. Then, we erform the binary search oeration bs(h(cj (σ( g i ))), L j), which returns {e j a 1, e j a 2,..., e j a v }. If there exists e j a λ, for some 1 λ v, such that bito(h(d q (σ( g i ))), sq a λ ) k and s q a λ e j a λ, then i occurs in t with at most k- mismatches. Theorem 3. Algorithm II solves Problem 1 and Problem 2 in O(ν n log n + Σ µ r ( ν log n) comutation time, and O(νn log + r) communication time. Proof. In ste 1, assuming that the text t is ket locally on the master rocessor, the distribution of t among the available rocessors can be done in n O(t s log + t w ( 1)) communication time. In ste 2, each rocessor creates an evenly balanced amount of signatures of the factors of t in O(ν n ) comutation time. In ste 3, the PSRS algorithm can be executed in O(ν n log n ) comutation time, where n 3, and O(n/ ) communication time. In ste 4, the gather oeration can be done in O(t s log + t w ν n ( 1)), and the one-to-all broadcast take O((t s +t w νn) log ) communication time. Ste 5 runs in O( Σ µ r ( ν log n) comutation time, for the binary search, and O(t s log +t w r ( 1)) communication time, for the atterns distribution. Hence, asymtotically, the overall time is O(ν n log n + Σ µ r ( ν ) ν k log n) comutation time, which is O(ν n log n + ( r ν log n) in ractice, and O(νn log + r) communication time. 6. Conclusion In this aer, we have resented arallel algorithms to tackle the data emerging from the next-generation sequencing technologies. These new technologies roduce a huge number of very short sequences, and these sequences need to be classified, tagged and recognised as arts of a reference genome. Very often, these sequences either contain degenerate symbols, or they are accomanied by the occurrence robability of each symbol at each osition of the sequence. To the best knowledge of the authors, the resented algorithms is the first roosed aroach, which makes use of the message-assing arallelism model, for the roblem of maing these sequences to a reference genome. In articular, we have resented Algorithm I for exact matching. It runs log n) comutation time, and O(n log + r) communication time, where is the number of available rocessors, n is the length of the genomic sequence, r is the number of atterns, Σ is the DNA alhabet, and µ is the maximum number of non-solid symbols of a attern. In addition, we have resented Algorithm II for aroximate matching. It in O( n log n + Σ µ r

10 10 COSTAS S. ILIOPOULOS, MIRKA MILLER, SOLON P. PISSIS runs in O(ν n log n + Σ µ r ( ν log n) comutation time, and O(νn log + r) communication time, where ν is the number of equal length fragments into which each attern is slit, and k is the number of allowed mismatches. The sequential versions of Algorithm I and Algorithm II were imlemented on a real dataset and resented in [6] and [9], giving some very romising results comaring to more traditional aroaches. Our immediate target is to build a software tool, which will be based on the resented algorithms, and will be used by the biologists for maing short sequences to a reference genome. References [1] htt:// [2] htt://www3.aliedbiosystems.com. [3] htt:// [4] htt:// [5] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Liman. Basic Local Alignment Search Tool. Journal of Molecular Biology, 215(3): , October [6] P. Antoniou, C. S. Iliooulos, L. Mouchard, and S. P. Pissis. Algorithms for maing short degenerate and weighted sequences to a reference genome. Int. J. Comutational Biology and Drug Design, 2(4): , [7] M. Burrows and D. Wheeler. A block sorting lossless data comression algorithm. Digital Equiment Cororation, Technical Reort 124, [8] A. Cox. ELAND, unublished [9] K. Frousios, C. S. Iliooulos, L. Mouchard, S. P. Pissis, and G. Tischler. REAL: An efficient REad ALigner for next generation sequencing reads. In Proceedings of the 1st International Conference On Bioinformatics and Comutational Biology (ACM-BCB 2010), (in ress). [10] H. Jiang and W. H. Wong. SeqMa: maing massive amount of oligonucleotides to the genome. Bioinformatics, 24(20): , [11] W. J. Kent. BLAT - the BLAST -like alignment tool. Genome Research, 12: , [12] B. Langmead, C. Tranell, M. Po, and S. L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25. [13] H. Li and R. Durbin. Fast and accurate short read alignment with Burrow-Wheeler transform. Bioinformatics, 25(14): , [14] H. Li, J. Ruan, and R. Durbin. Maing short DNA sequencing reads and calling variants using maing quality scores. Genome Research, 18: , [15] R. Li, Y. Li, K. Kristiansen, and J. Wang. SOAP: short oligonucleotide alignment rogram. Bioinformatics, 24(5): , [16] R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, and J. Wang. SOAP2 : an imroved ultrafast tool for short read alignment. Bioinformatics, 25(16): , [17] R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, S. Li, H. Yang, J. Wang, and J. Wang. De novo assembly of human genomes with massively arallel short read sequencing. Genome Research, 20(2): , February [18] J. R. Miller, S. Koren, and G. Sutton. Assembly algorithms for next-generation sequencing data. Genomics, doi: /j.ygeno [19] S. B. Ng, K. J. Buckingham, C. Lee, A. W. Bigham, H. K. Tabor, K. M. Dent,

11 PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED SEQUENCES 11 C. D. Huff, P. T. Shannon, E. W. Jabs, D. A. Nickerson, J. Shendure, and M. J. Bamshad. Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics, 42(1):30 35, January [20] W. R. Pearson and D. J. Liman. Imroved tools for biological sequence comarison. Proc. Natl. Acad. Sci. USA, 85: , [21] S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno. SHRiMP: Accurate Maing of Short Color-sace Reads. PLoS Comutational Biology, 5(5):e , May [22] F. Sanger and A. R. Coulson. A raid method for determining sequences in DNA by rimed synthesis with DNA olymerase. Journal of Molecular Biology, 94: , [23] F. Sanger, S. Nicklen, and A. R. Coulson. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA, 74: , [24] H. Shi and J. Schaeffer. Parallel sorting by regular samling. J. Parallel Distrib. Comut., 14(4): , [25] J. T. Simson, K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, and I. Birol. Abyss: A arallel assembler for short read sequence data. Genome Research, 19(6): , June [26] A. Smith, Z. Xuan, and M. Zhang. Using quality scores and longer reads imroves accuracy of solexa read maing. BMC Bioinformatics, 9(1):128, [27] J. R. ten Bosch and W. W. Grody. Keeing u with the next generation : Massively arallel sequencing in clinical diagnostics. Journal of Molecular Diagnostics, 10(6): , [28] T. D. Wu and S. Nacu. Fast and SNP-tolerant detection of comlex variants and slicing in short reads. Bioinformatics, 26: , [29] H. Xiang, J. Zhu, Q. Chen, F. Dai, X. Li, M. Li, H. Zhang, G. Zhang, D. Li, Y. Dong, L. Zhao, Y. Lin, D. Cheng, J. Yu, J. Sun, X. Zhou, K. Ma, Y. He, Y. Zhao, S. Guo, M. Ye, G. Guo, Y. Li, R. Li, X. Zhang, L. Ma, K. Kristiansen, Q. Guo, J. Jiang, S. Beck, Q. Xia, W. Wang, and J. Wang. Single base-resolution methylome of the silkworm reveals a sarse eigenomic ma. Nature Biotechnology, 28(5):516 20, 2010.

A Parallel Algorithm for Minimization of Finite Automata

A Parallel Algorithm for Minimization of Finite Automata A Parallel Algorithm for Minimization of Finite Automata B. Ravikumar X. Xiong Deartment of Comuter Science University of Rhode Island Kingston, RI 02881 E-mail: fravi,xiongg@cs.uri.edu Abstract In this

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel Performance Analysis Introduction Analysis of execution time for arallel algorithm to dertmine if it is worth the effort to code and debug in arallel Understanding barriers to high erformance and redict

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Elliptic Curves and Cryptography

Elliptic Curves and Cryptography Ellitic Curves and Crytograhy Background in Ellitic Curves We'll now turn to the fascinating theory of ellitic curves. For simlicity, we'll restrict our discussion to ellitic curves over Z, where is a

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

GIVEN an input sequence x 0,..., x n 1 and the

GIVEN an input sequence x 0,..., x n 1 and the 1 Running Max/Min Filters using 1 + o(1) Comarisons er Samle Hao Yuan, Member, IEEE, and Mikhail J. Atallah, Fellow, IEEE Abstract A running max (or min) filter asks for the maximum or (minimum) elements

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

Eigenanalysis of Finite Element 3D Flow Models by Parallel Jacobi Davidson

Eigenanalysis of Finite Element 3D Flow Models by Parallel Jacobi Davidson Eigenanalysis of Finite Element 3D Flow Models by Parallel Jacobi Davidson Luca Bergamaschi 1, Angeles Martinez 1, Giorgio Pini 1, and Flavio Sartoretto 2 1 Diartimento di Metodi e Modelli Matematici er

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

EXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES

EXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES EXACTLY ERIODIC SUBSACE DECOMOSITION BASED AROACH FOR IDENTIFYING TANDEM REEATS IN DNA SEUENCES Ravi Guta, Divya Sarthi, Ankush Mittal, and Kuldi Singh Deartment of Electronics & Comuter Engineering, Indian

More information

A randomized sorting algorithm on the BSP model

A randomized sorting algorithm on the BSP model A randomized sorting algorithm on the BSP model Alexandros V. Gerbessiotis a, Constantinos J. Siniolakis b a CS Deartment, New Jersey Institute of Technology, Newark, NJ 07102, USA b The American College

More information

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca

More information

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018 Comuter arithmetic Intensive Comutation Annalisa Massini 7/8 Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J Intensive Comutation - 7/8 3

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift

More information

A generalization of Amdahl's law and relative conditions of parallelism

A generalization of Amdahl's law and relative conditions of parallelism A generalization of Amdahl's law and relative conditions of arallelism Author: Gianluca Argentini, New Technologies and Models, Riello Grou, Legnago (VR), Italy. E-mail: gianluca.argentini@riellogrou.com

More information

Parallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720

Parallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720 Parallelism and Locality in Priority Queues A. Ranade S. Cheng E. Derit J. Jones S. Shih Comuter Science Division University of California Berkeley, CA 94720 Abstract We exlore two ways of incororating

More information

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H.

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H. Iranian Journal of Fuzzy Systems Vol. 5, No. 2, (2008). 21-33 GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H. SADREDDINI

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Logistics Optimization Using Hybrid Metaheuristic Approach under Very Realistic Conditions

Logistics Optimization Using Hybrid Metaheuristic Approach under Very Realistic Conditions 17 th Euroean Symosium on Comuter Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 Logistics Otimization Using Hybrid Metaheuristic Aroach

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

s v 0 q 0 v 1 q 1 v 2 (q 2) v 3 q 3 v 4

s v 0 q 0 v 1 q 1 v 2 (q 2) v 3 q 3 v 4 Discrete Adative Transmission for Fading Channels Lang Lin Λ, Roy D. Yates, Predrag Sasojevic WINLAB, Rutgers University 7 Brett Rd., NJ- fllin, ryates, sasojevg@winlab.rutgers.edu Abstract In this work

More information

Multi-Operation Multi-Machine Scheduling

Multi-Operation Multi-Machine Scheduling Multi-Oeration Multi-Machine Scheduling Weizhen Mao he College of William and Mary, Williamsburg VA 3185, USA Abstract. In the multi-oeration scheduling that arises in industrial engineering, each job

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle] Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE MATHEMATICS OF COMPUTATIO Volume 75, umber 256, October 26, Pages 237 247 S 25-5718(6)187-9 Article electronically ublished on June 28, 26 O POLYOMIAL SELECTIO FOR THE GEERAL UMBER FIELD SIEVE THORSTE

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search Eduard Babkin and Margarita Karunina 2, National Research University Higher School of Economics Det of nformation Systems and

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Matching Partition a Linked List and Its Optimization

Matching Partition a Linked List and Its Optimization Matching Partition a Linked List and Its Otimization Yijie Han Deartment of Comuter Science University of Kentucky Lexington, KY 40506 ABSTRACT We show the curve O( n log i + log (i) n + log i) for the

More information

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics Uncertainty Modeling with Interval Tye-2 Fuzzy Logic Systems in Mobile Robotics Ondrej Linda, Student Member, IEEE, Milos Manic, Senior Member, IEEE bstract Interval Tye-2 Fuzzy Logic Systems (IT2 FLSs)

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators S. K. Mallik, Student Member, IEEE, S. Chakrabarti, Senior Member, IEEE, S. N. Singh, Senior Member, IEEE Deartment of Electrical

More information

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process Journal of Statistical and Econometric Methods, vol., no.3, 013, 105-114 ISSN: 051-5057 (rint version), 051-5065(online) Scienress Ltd, 013 Evaluating Process aability Indices for some Quality haracteristics

More information

q-ary Symmetric Channel for Large q

q-ary Symmetric Channel for Large q List-Message Passing Achieves Caacity on the q-ary Symmetric Channel for Large q Fan Zhang and Henry D Pfister Deartment of Electrical and Comuter Engineering, Texas A&M University {fanzhang,hfister}@tamuedu

More information

The Binomial Approach for Probability of Detection

The Binomial Approach for Probability of Detection Vol. No. (Mar 5) - The e-journal of Nondestructive Testing - ISSN 45-494 www.ndt.net/?id=7498 The Binomial Aroach for of Detection Carlos Correia Gruo Endalloy C.A. - Caracas - Venezuela www.endalloy.net

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition TNN-2007-P-0332.R1 1 Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers

More information

Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability

Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability Theoretically Otimal and Emirically Efficient R-trees with Strong Parallelizability Jianzhong Qi, Yufei Tao, Yanchuan Chang, Rui Zhang School of Comuting and Information Systems, The University of Melbourne

More information

Network DEA: A Modified Non-radial Approach

Network DEA: A Modified Non-radial Approach Network DEA: A Modified Non-radial Aroach Victor John M. Cantor Deartment of Industrial and Systems Engineering National University of Singaore (NUS), Singaore, Singaore Tel: (+65) 913 40025, Email: victorjohn.cantor@u.nus.edu

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

Chapter 1 Fundamentals

Chapter 1 Fundamentals Chater Fundamentals. Overview of Thermodynamics Industrial Revolution brought in large scale automation of many tedious tasks which were earlier being erformed through manual or animal labour. Inventors

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers Sr. Deartment of

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Research of power plant parameter based on the Principal Component Analysis method

Research of power plant parameter based on the Principal Component Analysis method Research of ower lant arameter based on the Princial Comonent Analysis method Yang Yang *a, Di Zhang b a b School of Engineering, Bohai University, Liaoning Jinzhou, 3; Liaoning Datang international Jinzhou

More information

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding Homework Set # Rates definitions, Channel Coding, Source-Channel coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Speedup for Multi-Level Parallel Computing

Speedup for Multi-Level Parallel Computing Seedu for Multi-Level Parallel Comuting Shaniang Tang, Bu-Sung Lee,2, Bingsheng He School of Comuter Engineering 2 Service Platform Lab Nanyang Technological University HP Labs Singaore {stang5, ebslee,

More information

Developing A Deterioration Probabilistic Model for Rail Wear

Developing A Deterioration Probabilistic Model for Rail Wear International Journal of Traffic and Transortation Engineering 2012, 1(2): 13-18 DOI: 10.5923/j.ijtte.20120102.02 Develoing A Deterioration Probabilistic Model for Rail Wear Jabbar-Ali Zakeri *, Shahrbanoo

More information

Lower bound solutions for bearing capacity of jointed rock

Lower bound solutions for bearing capacity of jointed rock Comuters and Geotechnics 31 (2004) 23 36 www.elsevier.com/locate/comgeo Lower bound solutions for bearing caacity of jointed rock D.J. Sutcliffe a, H.S. Yu b, *, S.W. Sloan c a Deartment of Civil, Surveying

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Oil Temperature Control System PID Controller Algorithm Analysis Research on Sliding Gear Reducer

Oil Temperature Control System PID Controller Algorithm Analysis Research on Sliding Gear Reducer Key Engineering Materials Online: 2014-08-11 SSN: 1662-9795, Vol. 621, 357-364 doi:10.4028/www.scientific.net/kem.621.357 2014 rans ech Publications, Switzerland Oil emerature Control System PD Controller

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

An Introduction To Range Searching

An Introduction To Range Searching An Introduction To Range Searching Jan Vahrenhold eartment of Comuter Science Westfälische Wilhelms-Universität Münster, Germany. Overview 1. Introduction: Problem Statement, Lower Bounds 2. Range Searching

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Principles of Computed Tomography (CT)

Principles of Computed Tomography (CT) Page 298 Princiles of Comuted Tomograhy (CT) The theoretical foundation of CT dates back to Johann Radon, a mathematician from Vienna who derived a method in 1907 for rojecting a 2-D object along arallel

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

A Unified 2D Representation of Fuzzy Reasoning, CBR, and Experience Based Reasoning

A Unified 2D Representation of Fuzzy Reasoning, CBR, and Experience Based Reasoning University of Wollongong Research Online Faculty of Commerce - aers (Archive) Faculty of Business 26 A Unified 2D Reresentation of Fuzzy Reasoning, CBR, and Exerience Based Reasoning Zhaohao Sun University

More information

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 Using a Comutational Intelligence Hybrid Aroach to Recognize the Faults of Variance hifts for a Manufacturing Process Yuehjen E.

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment Neutrosohic Sets and Systems Vol 14 016 93 University of New Mexico Otimal Design of Truss Structures Using a Neutrosohic Number Otimization Model under an Indeterminate Environment Wenzhong Jiang & Jun

More information

Research Article Research on Evaluation Indicator System and Methods of Food Network Marketing Performance

Research Article Research on Evaluation Indicator System and Methods of Food Network Marketing Performance Advance Journal of Food Science and Technology 7(0: 80-84 205 DOI:0.9026/afst.7.988 ISSN: 2042-4868; e-issn: 2042-4876 205 Maxwell Scientific Publication Cor. Submitted: October 7 204 Acceted: December

More information

1-way quantum finite automata: strengths, weaknesses and generalizations

1-way quantum finite automata: strengths, weaknesses and generalizations 1-way quantum finite automata: strengths, weaknesses and generalizations arxiv:quant-h/9802062v3 30 Se 1998 Andris Ambainis UC Berkeley Abstract Rūsiņš Freivalds University of Latvia We study 1-way quantum

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

INTRODUCTION. Please write to us at if you have any comments or ideas. We love to hear from you.

INTRODUCTION. Please write to us at if you have any comments or ideas. We love to hear from you. Casio FX-570ES One-Page Wonder INTRODUCTION Welcome to the world of Casio s Natural Dislay scientific calculators. Our exeriences of working with eole have us understand more about obstacles eole face

More information

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction : An Algorithm for Aroximating Sarse Kernel Reconstruction Gregory Ditzler Det. of Electrical and Comuter Engineering The University of Arizona Tucson, AZ 8572 USA ditzler@email.arizona.edu Nidhal Carla

More information

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS Massimo Vaccarini Sauro Longhi M. Reza Katebi D.I.I.G.A., Università Politecnica delle Marche, Ancona, Italy

More information

Research of PMU Optimal Placement in Power Systems

Research of PMU Optimal Placement in Power Systems Proceedings of the 5th WSEAS/IASME Int. Conf. on SYSTEMS THEORY and SCIENTIFIC COMPUTATION, Malta, Setember 15-17, 2005 (38-43) Research of PMU Otimal Placement in Power Systems TIAN-TIAN CAI, QIAN AI

More information

START Selected Topics in Assurance

START Selected Topics in Assurance START Selected Toics in Assurance Related Technologies Table of Contents Introduction Statistical Models for Simle Systems (U/Down) and Interretation Markov Models for Simle Systems (U/Down) and Interretation

More information

Power System Reactive Power Optimization Based on Fuzzy Formulation and Interior Point Filter Algorithm

Power System Reactive Power Optimization Based on Fuzzy Formulation and Interior Point Filter Algorithm Energy and Power Engineering, 203, 5, 693-697 doi:0.4236/ee.203.54b34 Published Online July 203 (htt://www.scir.org/journal/ee Power System Reactive Power Otimization Based on Fuzzy Formulation and Interior

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

Recent Developments in Multilayer Perceptron Neural Networks

Recent Developments in Multilayer Perceptron Neural Networks Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael

More information

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications;

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications; Formal Modeling in Cognitive Science Lecture 9: and ; ; Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Proerties of 3 March, 6 Frank Keller Formal Modeling in Cognitive

More information

Analysis of Multi-Hop Emergency Message Propagation in Vehicular Ad Hoc Networks

Analysis of Multi-Hop Emergency Message Propagation in Vehicular Ad Hoc Networks Analysis of Multi-Ho Emergency Message Proagation in Vehicular Ad Hoc Networks ABSTRACT Vehicular Ad Hoc Networks (VANETs) are attracting the attention of researchers, industry, and governments for their

More information

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation 6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca

More information

Cryptography. Lecture 8. Arpita Patra

Cryptography. Lecture 8. Arpita Patra Crytograhy Lecture 8 Arita Patra Quick Recall and Today s Roadma >> Hash Functions- stands in between ublic and rivate key world >> Key Agreement >> Assumtions in Finite Cyclic grous - DL, CDH, DDH Grous

More information

Analyses of Orthogonal and Non-Orthogonal Steering Vectors at Millimeter Wave Systems

Analyses of Orthogonal and Non-Orthogonal Steering Vectors at Millimeter Wave Systems Analyses of Orthogonal and Non-Orthogonal Steering Vectors at Millimeter Wave Systems Hsiao-Lan Chiang, Tobias Kadur, and Gerhard Fettweis Vodafone Chair for Mobile Communications Technische Universität

More information

Parallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem

Parallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem Parallel Quantum-insired Genetic Algorithm for Combinatorial Otimization Problem Kuk-Hyun Han Kui-Hong Park Chi-Ho Lee Jong-Hwan Kim Det. of Electrical Engineering and Comuter Science, Korea Advanced Institute

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information