11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing
|
|
- Erick Simmons
- 6 years ago
- Views:
Transcription
1 I9 Introdution to Bioinformtis, 0 Indeing tehniques Yuzhen Ye (yye@indin.edu) Shool of Informtis & Computing, IUB Contents We hve seen indeing tehnique used in BLAST Applitions tht rely on n effiient indeing tehnique Compring short reds ginst referene genomes (e.g., RNA-Seq dt nlysis) Compring lrge genomes (e.g., MUMer) Indeing tehniques Hsh tble Suffi tree & suffi rry BWT Short-red mpping softwre Softwre Tehnique Developer Elnd Hshing reds Illumni SOAP Hshing refs BGI Mq Hshing reds Snger (Li, Heng) Bowtie & Bowtie BWT Slzberg/UMD BWA BWT Snger (Li, Heng) SOAP BWT & hshing BGI Indeing tet ( genome, et) Emple : we wnt to inde genome suh tht we n look up ny k-mer long the genome in O() time (without snning the whole genome). Emple : we wnt to inde protein dtbse suh tht we n look up ll the proteins ontining word (k-mer) in onstnt time. All need to use n effiient indeing tehnique netgenertionsequening.html (under Alignment tegory) Hshing Hshing is n indeing tehnique tht enble fst serh by omputing inde diretly bsed on the key inde key ASTTSA ASTTSS ASTTST vlues protein protein9 protein90 Some terminologies The proess of finding reord using some omputtion to mp its key vlue to position in the rry is lled hshing. The funtion tht mps key vlues to positions is lled hsh funtion (h). The rry tht holds the hsh tble is lled the hsh tble (HT). A position in the hsh tble is lso known s slot. Hsh funtion key inde
2 Hsh funtions nd ollisions The funtion tht mps key vlues to positions is lled hsh funtion (h). Typilly there re mny more vlues in the key rnge thn there re slots in the hsh tble. Given hsh funtion h nd two keys k nd k, if h(k )=h(k )=β, we sy tht k nd k hve ollision t slot β under hsh funtion h. Perfet hshing is system in whih reords re hshed suh tht there re no ollisions (e.g., indeing k-mers when k is smll). An idel hsh funtion stores the tul reords in the olletion suh tht eh slot in the hsh tble hs equl probbility of being filled; but lustering of reords hppens (mny reords hsh to only few of the slots) A simple hsh funtion for integers A funtion used to hsh integers to tble of slots int h(int ) { return ( % ) } The vlue returned by this hsh funtion depends solely on the lest signifint four bits of the key. These bits re likely to be poorly distributed (s n emple, high perentge of the keys might be even numbers, so the low order bit is zero), so the result my lso be poorly distributed. A simple hsh funtion for k-mers k= inde key The strings re DNA sequenes. 0 A int h(string, int k): T bint = { A :0, T :, C :, G :} C dd = id = 0 for i in k to : b = [i] id = id + bseint[b] dd dd = dd return id G How mny slots in the tble for k =? k = 0? h( CA, ) = 8 k= inde key 0 AA AT AC AG TA TT TC TG CA CT CC CG GA GT GC GG Collision resolution While the gol of hsh funtion is to minimize ollisions, some ollisions re unvoidble in prtie. Hshing implementtions must inlude some form of ollision resolution poliy. Two lss of ollision resolution tehniques: Open hshing (seprte hining) ollisions re stored outside the tble Closed hshing ollisions result in storing one of the reords t nother slot in the tble. Open hshing The simplest form of open hshing defines eh slot in the hsh tble to be the hed of linked list. All reords tht hsh to prtiulr slot re pled on tht slot s linked list. Reords within slot s list n be ordered in severl wys: by insertion order, by key vlue order, or by frequeny-of-ess order. The verge ost for hshing should be Θ(); however, if lustering of reords eists, then the ost to ess reord n be muh higher beuse mny elements on the linked list must be serhed. Closed hshing Closed hshing stores ll reords diretly in the hsh tble. A ollision resolution poliy mush be built to determine whih slot to use when ollision is deteted. The sme poliy must be followed during serh s during insertion. Some ommon losed hshing Buket hshing --- overflow goes to n overflow buket
3 Suffi tree In CS, suffi tree is ompressed trie ontining ll the suffies of the given tet s their keys nd positions in the tet s their vlues. Suffi tree llows one to find, etremely effiiently, ll distint subsequenes in given sequene. There re effiient lgorithms to onstrut suffi trees given by Weiner (9) nd MCreight (9) (in liner time) For the tsk of ompring two DNA sequenes, suffi trees llow one to quikly find ll subsequenes shred by the two inputs. The genome lignment is then built upon this informtion. Suffi tree of short sequene Lef is unique suffi An internl node is repeted sequene in the originl string ATCGTA# # A# TA# GTA# CGTA# TCGTA# ATCGTA# ATCGAT$ $ T$ AT$ GAT$ CGAT$ 0 TCGAT$ 9 ATCGAT$ 8 Mthing two sequenes 0 A CG T G $ # TA# AT$ AT$ TA# A# CG T$ T 0 CG AT$ TA# 9 AT$ TA# 8 ATCG is the longest ommon substring Every unique mthing sequene is represented by n internl node with etly two hild nodes, suh tht the hild nodes re lef nodes from different genomes Applied in MUMer MUMer method MUMer ombines suffi trees, the longest inresing subsequene (LIS) nd SW lignment Miml Unique Mth (MUM) Identifition - Identify the longest strings in Genome tht hve one identil mth in Genome Nïve method: O(N ) Using suffi tree: O(N) Ordered MUM Seletion - Identify the longest set of MUMs suh tht they our in order in eh of the genomes (using vrition of the well-known lgorithm to find the LIS of sequene of integers) Proessing Non-mthed Regions - Clssify nonmthed regions s either insertions, SNPs or highly polymorphi regions A toy emple of string (pttern) mthing T = b suffies ={b, b, b,,, } Pttern P : Pttern P : b b Preproess tet T, not pttern P b b Suffi tree for string mthing Preproess tet T, not pttern P O(m) preproess time (m: the length of the tet) O(n+k) serh time (n: the length of the pttern) k is number of ourrenes of P in T Mth pttern P ginst tree strting t root until Cse, P is ompletely mthed Every lef below this mth point is the strting lotion of P in T Cse : No mth is possible P does not our in T
4 Suffi rry Suffi rry is spe-effiient dt struture, whih is more ompt thn suffi tree The suffi rry is bsilly sorted rry position of ll the suffies of tet. The strt positions re sorted in leiogrphil (lphbetil) order A suffi rry for tet of length n n be built in O (n logn ) time, Serhing the tet for pttern of length m n be done in O (m log n ) time by binry serh; redued to O(m + logn) if using LCP (longest ommon prefi) A B R A C A D A B R A # # 0 A# ABRA# 0 ABRACADABRA# ACADABRA# ADABRA# 8 BRA# BRACADABRA# CADABRA# DABRA# 9 RA# RACADABRA# Serh in suffi rry Binry serh in suffi rry: in eh of the O(logn) omprisons, the input pttern P is ompred to the urrent entry of the suffi rry, whih mens full string omprison of up to m hrters (the whole pttern). So the ompleity is O(mlogn). The ompleity is redued to O(m + logn) when LCP is used. Longest ommon prefi (LCP) LCP LCP[i] is the length of the longest ommon prefi between the suffies strting from SA[i ] nd SA[i]. It keeps trk of the length of the longest ommon prefi mong two onseutive suffies of S when rrnged in leiogrphi order Liner-time longest-ommon-prefi omputtion in suffi rrys nd its pplitions. Ksi et l, 00 How LCP helps? Assume t one step of the binry serh, [L,...,R] is the rnge of the suffi rry with entrl point M; P is ompred to suffi SA[M]. Assume P nd the orresponding suffi shre the first k hrters, nd P is leiogrphilly lrger thn suffi SA[M], so, in the net step, [M,...,R] is onsidered nd new entrl point M needs to be determined: M... M'... R. We know lp(p,m)==k; lso LCP-LR is preomputed suh tht O()-lookup gives lp(m, M ), the longest ommon prefi of M nd M. Now there re three possibilities: Cse : k < lp(m,m'), i.e. P hs fewer prefi hrters in ommon with M thn M hs in ommon with M'. This mens the (k+)-th hrter of M' is the sme s tht of M, nd sine P is leiogrphilly lrger thn M, it must be leiogrphilly lrger thn M', too. So we ontinue in the right hlf [M',...,R]. Cse : k > lp(m,m'), i.e. P hs more prefi hrters in ommon with M thn M hs in ommon with M'. Consequently, if we were to ompre P to M', the ommon prefi would be smller thn k, nd M' would be leiogrphilly lrger thn P, so, without tully mking the omprison, we ontinue in the left hlf [M,...,M ]. Cse : k == lp(m,m'). So M nd M' re both identil with P in the first k hrters. To deide whether we ontinue in the left or right hlf, it suffies to ompre P to M' strting from the (k+)-th hrter. The onsequene is tht no hrter of P is ompred to ny hrter of the tet more thn one. The totl number of hrter omprisons is bounded by m, so the totl ompleity is indeed O(m+logn). Burrows-Wheeler Trnsform Burrow M & Wheeler D (99) Reversible permuttion of tet to llow better ompression (e.g. bzip) Algorithms eist to perform fst serh on BWtrnsformed dt Burrows-Wheeler Trnsform (BWT) $g g$ g$ 0 g$ g$ g$ g$ g$ 0 sorted in leiogrphil order Suffi rry $g g$ g$ g$ g$ g$ g$ Burrows-Wheeler Mtri (BWM) g$
5 Why Burrows-Wheeler? BWT very ompt (ws developed for ompression purpose) Approimtely ½ byte per bse (by ontrst, n integer number my tke bytes) As lrge s the originl tet, plus few etrs (the FM indies) Cn fit onto stndrd omputer with GB of memory (for indeing humn genome) Liner-time serh lgorithm Proportionl to the length of query for et mthes Mississippi emple s i s s i p p i $ m i s s s i p p i $ m i s s i Sorted leiogrphilly F s i s s i p p i $ m i s s s i p p i $ m i s s i C tble: C[] # of tet hrters whih re lphbetilly smller thn F s i s s i p p i $ m i s s s i p p i $ m i s s i FM indies $ i m p s 0 8 $ i m p s SA intervls?? C[ m ] = ( i + $ ) O funtion O(,q): # of ourrenes of hrter in the prefi [,q] O( s,0) = ( s in ipssm$piss ) F Lst to front mpping Suffi-id prior to the first T bwt [i] O(t [i], i)th t [i] LF() = lst to front mpping: The hrter [i] is loted in the first olumn F t position LF[i]; i.e., [i] =F[LF[i]] LF(i)=C[t [i]] + O(t [i], i) e.g., LF(0)=C[ s ] + O( s, 0) = 8 + = s i s s i p p i $ m i s Both t [0] nd F[] s s i p p i $ m i s s i orrespond to the first s in mississippi Only t is stored; F is shown for demonstrtion purpose Tet: g$ C tble $ g 0 O tble # $ g Reversible trnsform (using LF mpping) to reover tet LF(0)=C[ g ]+O( g,0) = + = LF()=C[ ]+O(,) = + = the lst hrter in the originl tet
6 Serhing in BWT-ompressed tet: ounting the ourrenes Algorithm for ounting the number of ourrenes of P[; p] in T[; u]; bkwrd serh lgorithm At the i-th phse, the prmeter sp points to the first row of M prefied by P[i; p] nd the prmeter ep points to the lst row of M prefied by P[i; p]. Ferrgin nd Mnzini showed tht it is possible to ompute O(,, k) in onstnt time. O(,, k) sometimes is shown s O(, k) : # of ourrenes of hrter in the prefi to k. Serhing in BWT-ompressed tet: determining the ourrenes (lotions) We know how to ount the ourrenes, then for s = sp, sp +,, ep, we need to find the position pos(s) in T of the suffi whih prefies the sth row M[s]; two methods were proposed. The first one is simple, whih relies on subset of the indies in tht re ssoited with position in suffi rry. If [j] hs position ssoited with it, lote(j) is trivil. If it's not ssoited, the string is followed with LF(i) (lst front mpping) until n ssoited inde is found. Lote n be implemented to find o ourrenes of pttern P[..p] in tet T[..u] in O(p + o log ε u) time. The seond method is fster nd relies on the very speil properties of the the string Tbw nd on different ompression lgorithm. Ferrgin nd Mnzini, 000 Opportunisti Dt Strutures with Applitions. Et mth: A simple emple Ref: Et mth (nother emple) BWT(gggt) = tg$gg Serh for pttern: g g g g g $gggt $gggt $gggt $gggt t$ggg t$ggg t$ggg t$ggg gt$gg gt$gg gt$gg gt$gg ggt$g ggt$g ggt$g ggt$g gggt$ gggt$ gggt$ gggt$ gt$gg gt$gg gt$gg gt$gg ggt$g ggt$g ggt$g ggt$g t$ggg t$ggg t$ggg t$ggg gt$gg gt$gg gt$gg gt$gg ggt$g ggt$g ggt$g ggt$g gggt$ gggt$ gggt$ gggt$ t$ggg t$ggg t$ggg t$ggg Test with your own seq nd pttern t: Inet mth For the et mthing problem, we only need to hek one SA intervl; for the inet mthing problem, there my be mny. Min dvntge of BWT ginst suffi rry BWT needs less memory thn suffi rry For humn genome m = * 0 9 (m is the length of the genome): Suffi rry: mlog (m) bits = m bytes = GB BWT: m/ bytes plus etrs = - GB m/ bytes to store BWT ( bits per hr) Suffi rry nd ourrene ounts rry tke lot more bits In prtie, SA nd OCC only prtilly stored, most elements re omputed on demnd (tkes time!) Trdeoff between time nd spe
7 Short-red mpping softwre Softwre Tehnique Developer Elnd Hshing reds Illumni SOAP Hshing refs BGI Mq Hshing reds Snger (Li, Heng) Bowtie & Bowtie BWT Slzberg/UMD BWA BWT Snger (Li, Heng) SOAP BWT & hshing BGI netgenertionsequening.html (under Alignment tegory)
Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18
Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re
More informationComputational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh
Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More information1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.
1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the
More informationWhere did dynamic programming come from?
Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf
More informationTutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.
Mth 5 Tutoril Week 1 - Jnury 1 1 Nme Setion Tutoril Worksheet 1. Find ll solutions to the liner system by following the given steps x + y + z = x + y + z = 4. y + z = Step 1. Write down the rgumented mtrix
More informationCSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4
Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones
More informationData Structures and Algorithm. Xiaoqing Zheng
Dt Strutures nd Algorithm Xioqing Zheng zhengxq@fudn.edu.n String mthing prolem Pttern P ours with shift s in text T (or, equivlently, tht pttern P ours eginning t position s + in text T) if T[s +... s
More informationAlgorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:
Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer
More information22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:
22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)
More informationAlgorithms in Computational. Biology. More on BWT
Algorithms in Computtionl Biology More on BWT tody Plese Lst clss! don't forget to submit And by next (vi emil, repo ) implementtion week or shre prgectfltw get Not I would like reding overview! Discuss
More informationFast index for approximate string matching
Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit
More informationProject 6: Minigoals Towards Simplifying and Rewriting Expressions
MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy
More informationT b a(f) [f ] +. P b a(f) = Conclude that if f is in AC then it is the difference of two monotone absolutely continuous functions.
Rel Vribles, Fll 2014 Problem set 5 Solution suggestions Exerise 1. Let f be bsolutely ontinuous on [, b] Show tht nd T b (f) P b (f) f (x) dx [f ] +. Conlude tht if f is in AC then it is the differene
More informationCS 573 Automata Theory and Formal Languages
Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple
More informationCommon intervals of genomes. Mathieu Raffinot CNRS LIAFA
Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr
More informationCounting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs
Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if
More informationComparing the Pre-image and Image of a Dilation
hpter Summry Key Terms Postultes nd Theorems similr tringles (.1) inluded ngle (.2) inluded side (.2) geometri men (.) indiret mesurement (.6) ngle-ngle Similrity Theorem (.2) Side-Side-Side Similrity
More informationMetodologie di progetto HW Technology Mapping. Last update: 19/03/09
Metodologie di progetto HW Tehnology Mpping Lst updte: 19/03/09 Tehnology Mpping 2 Tehnology Mpping Exmple: t 1 = + b; t 2 = d + e; t 3 = b + d; t 4 = t 1 t 2 + fg; t 5 = t 4 h + t 2 t 3 ; F = t 5 ; t
More informationMaintaining Mathematical Proficiency
Nme Dte hpter 9 Mintining Mthemtil Profiieny Simplify the epression. 1. 500. 189 3. 5 4. 4 3 5. 11 5 6. 8 Solve the proportion. 9 3 14 7. = 8. = 9. 1 7 5 4 = 4 10. 0 6 = 11. 7 4 10 = 1. 5 9 15 3 = 5 +
More informationIntermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths
Intermedite Mth Cirles Wednesdy 17 Otoer 01 Geometry II: Side Lengths Lst week we disussed vrious ngle properties. As we progressed through the evening, we proved mny results. This week, we will look t
More information, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1.
Exerise Genertor polynomils of onvolutionl ode, given in binry form, re g, g j g. ) Sketh the enoding iruit. b) Sketh the stte digrm. ) Find the trnsfer funtion T. d) Wht is the minimum free distne of
More information6.5 Improper integrals
Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =
More informationCalculus Cheat Sheet. Integrals Definitions. where F( x ) is an anti-derivative of f ( x ). Fundamental Theorem of Calculus. dx = f x dx g x dx
Clulus Chet Sheet Integrls Definitions Definite Integrl: Suppose f ( ) is ontinuous Anti-Derivtive : An nti-derivtive of f ( ) on [, ]. Divide [, ] into n suintervls of is funtion, F( ), suh tht F = f.
More information(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.
Chpter 7: The Riemnn Integrl When the derivtive is introdued, it is not hrd to see tht the it of the differene quotient should be equl to the slope of the tngent line, or when the horizontl xis is time
More informationDiscrete Structures Lecture 11
Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.
More informationIntroduction to Bioinformatics
Introdution to Bioinformtis Outline } Method without onsidering bkground distribution } Generl pproh onsidering bkground distribution } Wys to speed up the lgorithm Trnsription Ftor Binding Sites (TFBSs)
More informationAlignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter
Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostt.wisc.edu Gols for Lecture Key concepts how lrge-scle lignment differs from the simple cse the
More informationType 2: Improper Integrals with Infinite Discontinuities
mth imroer integrls: tye 6 Tye : Imroer Integrls with Infinite Disontinuities A seond wy tht funtion n fil to be integrble in the ordinry sense is tht it my hve n infinite disontinuity (vertil symtote)
More informationPart 4. Integration (with Proofs)
Prt 4. Integrtion (with Proofs) 4.1 Definition Definition A prtition P of [, b] is finite set of points {x 0, x 1,..., x n } with = x 0 < x 1
More informationComputing data with spreadsheets. Enter the following into the corresponding cells: A1: n B1: triangle C1: sqrt
Computing dt with spredsheets Exmple: Computing tringulr numers nd their squre roots. Rell, we showed 1 ` 2 ` `n npn ` 1q{2. Enter the following into the orresponding ells: A1: n B1: tringle C1: sqrt A2:
More informationLossless Compression Lossy Compression
Administrivi CSE 39 Introdution to Dt Compression Spring 23 Leture : Introdution to Dt Compression Entropy Prefix Codes Instrutor Prof. Alexnder Mohr mohr@s.sunys.edu offie hours: TBA We http://mnl.s.sunys.edu/lss/se39/24-fll/
More informationData Structures LECTURE 10. Huffman coding. Example. Coding: problem definition
Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt
More informationFinite State Automata and Determinisation
Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions
More informationSuffix Trays and Suffix Trists: Structures for Faster Text Indexing
Suffix Trys nd Suffix Trists: Strutures for Fster Text Indexing Rihrd Cole Tsvi Kopelowitz Moshe Lewenstein rxiv:1311.1762v1 [s.ds] 7 Nov 2013 Astrt Suffix trees nd suffix rrys re two of the most widely
More informationEllipses. The second type of conic is called an ellipse.
Ellipses The seond type of oni is lled n ellipse. Definition of Ellipse An ellipse is the set of ll points (, y) in plne, the sum of whose distnes from two distint fied points (foi) is onstnt. (, y) d
More informationLinear Algebra Introduction
Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +
More informationModule 9: Tries and String Matching
Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer
More informationModule 9: Tries and String Matching
Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer
More informationINTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable
INTEGRATION NOTE: These notes re supposed to supplement Chpter 4 of the online textbook. 1 Integrls of Complex Vlued funtions of REAL vrible If I is n intervl in R (for exmple I = [, b] or I = (, b)) nd
More informationAlgorithms for bioinformatics Part 2: Data structures
Alorithms for bioinformtics Prt 2: Dt structures Greory Kucherov LIGM/CNRS Mrne-l-Vllée Pln Clssicl indexes Suffix trees DAWG nd Position heps Suffix rrys Succinct (compressed) indexes Burrows-Wheeler
More informationwhere the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b
CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}
More informationChapter 4 State-Space Planning
Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different
More informationPeriodic string comparison
Periodi string omprison Alexnder Tiskin Deprtment of Computer Siene University of Wrwik http://www.ds.wrwik..uk/~tiskin Alexnder Tiskin (Wrwik) Periodi string omprison 1 / 51 1 Introdution 2 Semi-lol string
More informationAlgorithm Design and Analysis
Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing
More informationEngr354: Digital Logic Circuits
Engr354: Digitl Logi Ciruits Chpter 4: Logi Optimiztion Curtis Nelson Logi Optimiztion In hpter 4 you will lern out: Synthesis of logi funtions; Anlysis of logi iruits; Tehniques for deriving minimum-ost
More informationExercise sheet 6: Solutions
Eerise sheet 6: Solutions Cvet emptor: These re merel etended hints, rther thn omplete solutions. 1. If grph G hs hromti numer k > 1, prove tht its verte set n e prtitioned into two nonempt sets V 1 nd
More informationIntroduction to Olympiad Inequalities
Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................
More informationCS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6
CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized
More informationNON-DETERMINISTIC FSA
Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is
More informationLecture Notes No. 10
2.6 System Identifition, Estimtion, nd Lerning Leture otes o. Mrh 3, 26 6 Model Struture of Liner ime Invrint Systems 6. Model Struture In representing dynmil system, the first step is to find n pproprite
More information(h+ ) = 0, (3.1) s = s 0, (3.2)
Chpter 3 Nozzle Flow Qusistedy idel gs flow in pipes For the lrge vlues of the Reynolds number typilly found in nozzles, the flow is idel. For stedy opertion with negligible body fores the energy nd momentum
More informationAlgorithm Design and Analysis
Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,
More informationComputing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )
Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8
More informationThe Riemann-Stieltjes Integral
Chpter 6 The Riemnn-Stieltjes Integrl 6.1. Definition nd Eistene of the Integrl Definition 6.1. Let, b R nd < b. ( A prtition P of intervl [, b] is finite set of points P = { 0, 1,..., n } suh tht = 0
More information18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106
8. Problem Set Due Wenesy, Ot., t : p.m. in - Problem Mony / Consier the eight vetors 5, 5, 5,..., () List ll of the one-element, linerly epenent sets forme from these. (b) Wht re the two-element, linerly
More informationSolutions to Assignment 1
MTHE 237 Fll 2015 Solutions to Assignment 1 Problem 1 Find the order of the differentil eqution: t d3 y dt 3 +t2 y = os(t. Is the differentil eqution liner? Is the eqution homogeneous? b Repet the bove
More informationPowering a number. More Divide & Conquer
CS 4 -- Spring 29 Powering numer Prolem: Compute n, where n N. Nive lgorithm: Θ(n). ore Divide & Conquer Crol Wenk Slides ourtesy of Chrles Leiserson with smll hnges y Crol Wenk 2//9 CS 4 nlysis of lgorithms
More informationGreen s Theorem. (2x e y ) da. (2x e y ) dx dy. x 2 xe y. (1 e y ) dy. y=1. = y e y. y=0. = 2 e
Green s Theorem. Let be the boundry of the unit squre, y, oriented ounterlokwise, nd let F be the vetor field F, y e y +, 2 y. Find F d r. Solution. Let s write P, y e y + nd Q, y 2 y, so tht F P, Q. Let
More informationTrigonometry Revision Sheet Q5 of Paper 2
Trigonometry Revision Sheet Q of Pper The Bsis - The Trigonometry setion is ll out tringles. We will normlly e given some of the sides or ngles of tringle nd we use formule nd rules to find the others.
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationLecture 6: Coding theory
Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those
More informationMatrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix
tries Definition of tri mtri is regulr rry of numers enlosed inside rkets SCHOOL OF ENGINEERING & UIL ENVIRONEN Emple he following re ll mtries: ), ) 9, themtis ), d) tries Definition of tri Size of tri
More informationGoals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite
Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite
More informationProperties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives
Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn
More informationTOPIC: LINEAR ALGEBRA MATRICES
Interntionl Blurete LECTUE NOTES for FUTHE MATHEMATICS Dr TOPIC: LINEA ALGEBA MATICES. DEFINITION OF A MATIX MATIX OPEATIONS.. THE DETEMINANT deta THE INVESE A -... SYSTEMS OF LINEA EQUATIONS. 8. THE AUGMENTED
More informationQUADRATIC EQUATION. Contents
QUADRATIC EQUATION Contents Topi Pge No. Theory 0-04 Exerise - 05-09 Exerise - 09-3 Exerise - 3 4-5 Exerise - 4 6 Answer Key 7-8 Syllus Qudrti equtions with rel oeffiients, reltions etween roots nd oeffiients,
More information20 b The prime numbers are 2,3,5,7,11,13,17,19.
Topi : Probbility Short nswer tehnology- free The following my be of use in this test:! 0 0 Two rows of Psl s tringle re: ontiner holds irulr piees eh of the sme size. Written on eh is different number,
More informationMore Properties of the Riemann Integral
More Properties of the Riemnn Integrl Jmes K. Peterson Deprtment of Biologil Sienes nd Deprtment of Mthemtil Sienes Clemson University Februry 15, 2018 Outline More Riemnn Integrl Properties The Fundmentl
More informationMA10207B: ANALYSIS SECOND SEMESTER OUTLINE NOTES
MA10207B: ANALYSIS SECOND SEMESTER OUTLINE NOTES CHARLIE COLLIER UNIVERSITY OF BATH These notes hve been typeset by Chrlie Collier nd re bsed on the leture notes by Adrin Hill nd Thoms Cottrell. These
More informationMath Lecture 23
Mth 8 - Lecture 3 Dyln Zwick Fll 3 In our lst lecture we delt with solutions to the system: x = Ax where A is n n n mtrix with n distinct eigenvlues. As promised, tody we will del with the question of
More informationRead Mapping. Burrows Wheeler Transform and Reference Based Assembly. Genomics: Lecture #5 WS 2014/2015
Mapping Burrows Wheeler and Reference Based Assembly Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #5 WS 2014/2015 Today Burrows Wheeler FM index
More information1 Part II: Numerical Integration
Mth 4 Lb 1 Prt II: Numericl Integrtion This section includes severl techniques for getting pproimte numericl vlues for definite integrls without using ntiderivtives. Mthemticll, ect nswers re preferble
More information5.7 Improper Integrals
458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the
More informationA Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version
A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment
More informationDistance-Join: Pattern Match Query In a Large Graph Database
Distne-Join: Pttern Mth Query In Lrge Grph Dtbse Lei Zou Huzhong University of Siene nd Tehnology Wuhn, Chin zoulei@mil.hust.edu.n Lei Chen Hong Kong University of Siene nd Tehnology Hong Kong leihen@se.ust.hk
More informationXML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW
XML n Dtses Exm Prepertion Disuss Answers to lst yer s exm Sestin Mneth NICTA n UNSW CSE@UNSW -- Semester 1, 2008 (1) For eh of the following, explin why it is not well-forme XML (is WFC or the XML grmmr
More informationMath 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1)
Green s Theorem Mth 3B isussion Session Week 8 Notes Februry 8 nd Mrh, 7 Very shortly fter you lerned how to integrte single-vrible funtions, you lerned the Fundmentl Theorem of lulus the wy most integrtion
More informationAP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals
AP Clulus BC Chpter 8: Integrtion Tehniques, L Hopitl s Rule nd Improper Integrls 8. Bsi Integrtion Rules In this setion we will review vrious integrtion strtegies. Strtegies: I. Seprte the integrnd into
More informationCS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014
S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown
More informationWelcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz
Welome nge Li Gørt. everse tehing n isussion of exerises: 02110 nge Li Gørt 3 tehing ssistnts 8.00-9.15 Group work 9.15-9.45 isussions of your solutions in lss 10.00-11.15 Leture 11.15-11.45 Work on exerises
More information5.2 Exponent Properties Involving Quotients
5. Eponent Properties Involving Quotients Lerning Objectives Use the quotient of powers property. Use the power of quotient property. Simplify epressions involving quotient properties of eponents. Use
More information] dx (3) = [15x] 2 0
Leture 6. Double Integrls nd Volume on etngle Welome to Cl IV!!!! These notes re designed to be redble nd desribe the w I will eplin the mteril in lss. Hopefull the re thorough, but it s good ide to hve
More informationDiscrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α
Disrete Strutures, Test 2 Mondy, Mrh 28, 2016 SOLUTIONS, VERSION α α 1. (18 pts) Short nswer. Put your nswer in the ox. No prtil redit. () Consider the reltion R on {,,, d with mtrix digrph of R.. Drw
More informationEngineering a Lightweight Suffix Array Construction Algorithm (Extended Abstract)
Engineering Lightweight Suffix Arry Constrution Algorithm (Extended Astrt) Giovnni Mnzini 1,2 nd Polo Ferrgin 3 1 Diprtimento di Informti, Università del Piemonte Orientle I-15100 Alessndri, Itly mnzini@mfn.unipmn.it
More informationOn-Line Construction. of Suffix Trees. Overview. Suffix Trees. Notations. goo. Suffix tries
On-Line Cnstrutin Overview Suffix tries f Suffix Trees E. Ukknen On-line nstrutin f suffix tries in qudrti time Suffix trees On-line nstrutin f suffix trees in liner time Applitins 1 2 Suffix Trees A suffix
More informationCS 360 Exam 2 Fall 2014 Name
CS 360 Exm 2 Fll 2014 Nme 1. The lsses shown elow efine singly-linke list n stk. Write three ifferent O(n)-time versions of the reverse_print metho s speifie elow. Eh version of the metho shoul output
More informationAVL Trees. D Oisín Kidney. August 2, 2018
AVL Trees D Oisín Kidne August 2, 2018 Astrt This is verified implementtion of AVL trees in Agd, tking ides primril from Conor MBride s pper How to Keep Your Neighours in Order [2] nd the Agd stndrd lirr
More informationNecessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )
Neessry n suient onitions for some two vrile orthogonl esigns in orer 44 C. Koukouvinos, M. Mitrouli y, n Jennifer Seerry z Deite to Professor Anne Penfol Street Astrt We give new lgorithm whih llows us
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More informationReview: The Riemann Integral Review: The definition of R b
eview: The iemnn Integrl eview: The definition of b f (x)dx. For ontinuous funtion f on the intervl [, b], Z b f (x) dx lim mx x i!0 nx i1 f (x i ) x i. This limit omputes the net (signed) re under the
More informationSimilar Right Triangles
Geometry V1.noteook Ferury 09, 2012 Similr Right Tringles Cn I identify similr tringles in right tringle with the ltitude? Cn I identify the proportions in right tringles? Cn I use the geometri mens theorems
More informationXML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths
Outline Leture Effiient XPth Evlution XML n Dtses. Top-Down Evlution of simple pths. Noe Sets only: Core XPth. Bottom-Up Evlution of Core XPth. Polynomil Time Evlution of Full XPth Sestin Mneth NICTA n
More informationI1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3
2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is
More informationTechnische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution
Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:
More information5. Every rational number have either terminating or repeating (recurring) decimal representation.
CHAPTER NUMBER SYSTEMS Points to Rememer :. Numer used for ounting,,,,... re known s Nturl numers.. All nturl numers together with zero i.e. 0,,,,,... re known s whole numers.. All nturl numers, zero nd
More informationPolynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230
Polynomil Approimtions for the Nturl Logrithm nd Arctngent Functions Mth 23 You recll from first semester clculus how one cn use the derivtive to find n eqution for the tngent line to function t given
More informationMAT 403 NOTES 4. f + f =
MAT 403 NOTES 4 1. Fundmentl Theorem o Clulus We will proo more generl version o the FTC thn the textook. But just like the textook, we strt with the ollowing proposition. Let R[, ] e the set o Riemnn
More informationSolutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!
Solutions for HW9 Exerise 28. () Drw C 6, W 6 K 6, n K 5,3. C 6 : W 6 : K 6 : K 5,3 : () Whih of the following re iprtite? Justify your nswer. Biprtite: put the re verties in V 1 n the lk in V 2. Biprtite:
More informationMath 8 Winter 2015 Applications of Integration
Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl
More information