Introduction to Bioinformatics

Introdution to Bioinformtis

Outline } Method without onsidering bkground distribution } Generl pproh onsidering bkground distribution } Wys to speed up the lgorithm

Trnsription Ftor Binding Sites (TFBSs) 3 7/20/17

Trnsription Ftor Binding Sites } DNA sequene segments tht trnsription ftors (TF) bind to re lled trnsription ftorbinding sites (TFBSs). (TFBSs) } TF intert with their TFBS using ombintion of eletrostti nd Vn der Wls fores. } Most of the TFs bind DNA in motif speifi mnner, i.e. TFs n bind to list of similr DNA sequene segments.

Trnsription Ftor Binding Sites } Trnsription ftor binding sites re usully short (round 5-15 bp) } They re frequently degenerte sequene motifs o The sequene degenery onfers different levels of regultion } Given genome, the predition of TFBSs is diffiult nd risky tsk. (TFBSs)

Identifition of TFBSs } Experiment methods o Trditionl methods o o o o Foot-printing methods Nitroellulose binding ssys Gel-shift nlysis Southwestern blotting } TFBSs in silio o o Aim: to identify more ndidte trget TFBS. Degenerte onsensus sequenes. (Drwbk: does not ontin preise likelihood informtion) } High-throughput method o o Finding high-ffinity binding sequene in vitro (SELEX) High-throughput method in vivo: ChIP-hip o Position weighted mtrix (PWM) or PSSM (position speifi mtrix) is ommon pproh to this problem.

Position Weight Mtries (PWMs) 7 7/20/17

Position weight mtrix (PWM)/ Positionspeifi weight mtrix(pssm) PWM is ommonly used representtion of motifs(ptterns) in biologil sequenes. Imgine two experimentlly determined TF binding sites for onetf: Seq1: ATTGAGTCGCAGTGACTCAAG Seq2: CTTGAGTCAGGCAGGCTCAAT Constrution of Position Weight Mtrix (PWM): PWM of "better qulity": Construted using 33 TF binding sites for one TF

} Length of PWM (number of olumns): Definitions: M f f * i, j i, j ' = o o bsolute PWM (ount mtrix): Î N with i Î{ A, T, C, G} nd reltive PWM (frequeny mtrix): * fi, j * å fk, j kî{ A, T, C, G} j Î[ 0, M -1] 1 2 3 A 0.8 0.. T 0.1 0.. C 0.1 0.2.. G 0 0.8..

A simple TFBS mthing tool 10 7/20/17

Nïve method without onsidering bkground distribution MATCH TM : tool for serhing trnsription ftor binding sites in DNA sequenes (A.E. Kel et l. 2003) } Input: (1) DNA sequenes ontining potentil TF binding sites (2) PWM Output: A list of found potentil sites. } Two types of sores re lulted o o Core Similrity Sore (CSS) : only lulted for the first five onseutive onserved region. Mtrix Similrity Sore (MSS): lulted for ll the positions

Nïve method without onsidering bkground distribution MATCH TM : tool for serhing trnsription ftor binding sites in DNA sequenes (A.E. Kel et l. 2003) MSS( CSS ) = Current - Min Mx - Min MSS ( CSS ) Î[0,1] = å - Current j= L = å - Mx j= 1 0 L 1 0 I( j) I( j) f mx j f nu ( j), j ' f f i, j ' mx j = f * i, j = mx{ f i nu(j) refers to the nuleotide with index j. å * fk, j kî{ A, T, C, G} * i, j } å * fk, j kî{ A, T, C, G} (highest frequeny of nuleotide in position j in the mtrix) Min : L å - i= 1 0 I( j) f min j f min j = min{ f i * i, j } å * fk, j kî{ A, T, C, G} (lowest frequeny of nuleotide in position j in the mtrix) I( j) = å iî{ A, T, G, C} fi, j ln(4 fi, j ) j = 1,2,..., L Informtion vetor

} Two utoffs re kept for CSS nd MSS sores respetively. Proedure: } A window onsisting of five nuleotides is moving long the sequene. } CSS (ore similrity sore) is lulted. } For eh CSS higher thn CSS utoff, the sequene nd is prolonged t both ends to fit the mtrix length. Then the MSS sore is lulted } If two sores re both higher thn ut-offs, then output s yes instne ATCGTACTAGCTACGATCAA TCGA Clulte CSS sore Chek if the sore is bove the CSS threshold Prolong ATCGTACTAGCTACGATCAA TCGA Clulte MSS sore Chek if the sore is bove MSS threshold

Inorporting the bkground 14 7/20/17

Bkground model: } Some nuleotides in the PWM ount more thn others Nuleotide ontents (nonoding), C. effiiens Nuleotide ontents (totl), C. effiiens Nuleotide ontents (oding), C. effiiens

} Length of PWM (number of olumns): o Bkground model: p Î[0,1] with i Î{ A, T, C, G} i Definitions: with M å p i = 1 i A 0.180 T 0.182 C 0.330 G 0.308 o bsolute PWM (ount mtrix): f * i, j f à i, j ' f Î N with o = o p i, j i Î{ A, T, C, G} reltive PWM (frequeny mtrix): * fi, j * å fk, j kî{ A, T, C, G} Pseudo-ounts per olumn (void overfitting): e.g. = f * i, j + p i à f i, j = nd p fi, j p å fk, j kî{ A, T, C, G} j Î[ 0, M -1] = 4 1 2 3 A 0.8 0.. T 0.1 0.. C 0.1 0.2.. G 0 0.8..

} Soring funtion (log-odds sore): where nu(j) = nuleotide with index j Mthing proedure: Definitions: S strtidx endidx f = å, endidx ln p j= strtidx nu( j), j nu( j) Seq = A G C A A T T A A A T T G G A T A A C.. PWM = S } Clulte sore for every position of the sliding window 0, M -1 S M > th } Report every mth with (th is the threshold of being signl) But how to set good threshold vlue? 0, -1

Sore distribution: l B (X ) } sore distribution of the PWM lulted with rndom sequenes ording to bkground model. l T (X ) } sore distribution lulted with rndom sequenes ording to PWM model. P Z ( X = s) } probbility of observing sore s under distribution Z. We re interested in: P Z ( X ³ s) } probbility of observing t lest sore s. with P Z ( X ³ s) = mx å i= s P Z ( X = i)

pvlue: p = P B ( X å x= s Probbility of observing t lest sore s by hne ³ s) = mx P B ( X = x) à Set th = s, with P B ( X ³ s) = p for given p (pitures: ssuming stndrd norml distribution) Set equl flse positive nd flse negtive errors: - Set s, where P ( X ³ s) = P ( X s) B T

Methods to speed up generl mthing pproh } The generl mthing pproh ims for finding binding site by moving the window of length M long sequene of length N. } The time omplexity of stright-forwrd implementtion is O(MN) } Severl methods were implemented to speed up the PWM/PSSM o o o o Lookhed lgorithm Permutted lookhed lgorithm Suffix tree Enhned suffix rry

Let s speed it up! Kirk: How muh time to you need, Sotty? Sotty: Gimme 20 minutes. Kirk: You got 10. Sotty: OK. I ll do in in 5. Two minutes lter 22 7/20/17

Lookhed lgorithm } The motivtion: given segment of sequene, we wnt to know whether we n rejet its probbility of being signl s erly s possible. o o For given sequene segment of length M, we hve the sore funtion: S M å - 1 0, M - 1 = ln( f nu ( j), j p nu( j) ) j= 0 ---(1) We define the minimum nd mximum sore for given PWM: S M å - 1 min( 0, M - 1) = min {ln( f, j p )} Î{ A, T, G, C} j= 0 ---(2) S M å - 1 mx( 0, M - 1) = mx {ln( f, j p )} Î{ A, T, G, C} j= 0 ---(3)

Lookhed lgorithm 0 d M -1 o For ny, we lso define the prefix sore of depth d: pfxs d = S d 0, d = åln( fnu( j), j p nu( j) ) j= 0 ---(4) s d = S o And the mximl sore in the lst M-d -1 positions of the PWM: M å - 1 mx( d + 1, M -1) = mx {ln( f, j p )} Î{ A, T, G, C} j= d + 1 ---(5) o Finlly, we n lulte the intermedite threshold t position d: th d = th-s d ----(6)

Lookhed lgorithm } Therefore, the following sttements re equivlent: pfxs Û S d 0, M -1 ³ th ³ th d for ll d(0 d M -1) } Bsilly, when prefix hs sore so low tht even if the rest of the segment hieves mximl sore, still the sore for whole segment is below the threshold, then we must rejet it. ATGCGCTTAAGTCTGTGGTCAAATGCTAGCTACGTACGATCGAT C pfxs (prefix sore) (mx sore) d s d Chek if bove th d for every position, if not, then rejet it.

Permutted lookhed lgorithm } With the lookhed lgorithm, the sooner we rejet segment, the better running time we hve. } Therefore, it mkes sense to hek the positions in PWM tht is more likely to be rejeted by lookhed lgorithm. We implement this ide by permuttion of PWM: } Eh olumn of PWM hs highest sore: M j = S mx( j, j) = mx {ln( f, Î{ A, T, G, C} j p )} } nd n expettion of the sore if the residue is generted by bkground model: E j = å å S A T G C j p = f A T G C j p Î{,,, }, ln( Î{,,, }, ) p

Permutted lookhed lgorithm } We fous on the differene between M j nd E j. If the expettion for olumn is omprtively low to the highest sore, then it is more likely the segment is rejeted t this olumn. } Therefore, we order the mtrix by (M j - E j ), nd ompute the most dngerous olumn first. Position Differene Permutte 0 1 2 3 4 5 6 7 8 Order by differene 0 1 2 3 4 5 6 7 8 A T G C G A T C G A G T G T C A G C A T G C G A T C G 1 3 4 6 2 7 5 9 8 0 4 1 2 6 3 5 8 7 0 4 1 2 6 3 5 8 7 A G T G T C A G C 1 2 3 4 5 6 7 8 9 pfxs d s d

1. Suffix tree is dt struture tht presents ll the suffixes of given string. 2. A suffix tree for string w, is tree whose edges re lbeled with substrings. Eh suffix of w orresponds to extly one pth from the tree s root to lef. Suffix tree 3. Suffix tree is speil dt struture tht llows number of string opertions to be rried out in n effiient wy Suffix tree for the string. Substring termintes with. The 12 pths from the root to lef orrespond to the 12 suffixes.

Number Substring 0 1 Suffix tree 2 3 4 5 6 7 8 9 10 11

Suffix tree Key fetures of suffix tree T for string w[0, m-1] is rooted tree with : 1. m leves numbered from 0 to m-1 2. At lest two hildren for eh internl node (exept root) 3. Eh lbel represents substring of w (nonempty) 4. No two edges out of the sme node begin with sme hrter

Applitions of Suffix tree } One of the simplest pplition of suffix tree is to hek whether string P of length m is substring of the given string w in O(m) time. } Construt the suffix tree T of string w. And mth string P long from the root to lef } If there exists omplete mth, then P is substring of w, otherwise, not. Chek if is substring of

Applitions of Suffix tree } Besides, there re mny other pplitions of suffix tree. Given suffix tree of string w of length n, 1. Find the first ourrene of the ptterns P 1,,P q, of totl length m in O(m) time. 2. Serh for regulr expression in P in time expeted subliner in n. 3. Find the longest ommon substrings of string w i nd w j in Θ(n i +n j ) time. 4. Find the longest repeted substring in Θ(n) time. 5.

How to grow suffix tree (nïve method) } The running time for nïve onstrution of suffix tree is O(n 2 ) ( n: text size) } For exmple, we wnt to onstrut suffix tree of string xbx xbx 0 1. Strt with the whole string (lef number 1) nd onnet the root with the lef

How to grow suffix tree (nïve method) 2. Generte suffixes w[1 n-1], w[2 n-1],, w[n-1], nd push them into the tree one by one. Suffixes: - bx - bx - x - - - xbx 0

How to grow suffix tree (nïve method) 3. To insert Sfx i = w[i n-1], follow the pth from the root, mthing hrters of Sfx i until the first mismth t the hrter Sfx i [j]. There re two ses: Insert seond nd third suffixes xbx 0 bx i. If the mthing nnot ontinue from node (whih mens mismth hppens to be t the beginning of next edge), then rete new node. Lbel the edge to its orresponding substring. bx 1 2

How to grow suffix tree (nïve method) ii. If the mismth ours in the middle of n edge e = (u,v), then denote the edge to be 0, l-1. Insertion of x uses first edge to split bx 0 Let the mismth our t k, then rete new node w, nd reple edge e by edges (u,w) nd (w,v), lbeled by 1,, k-1, nd k l-1. x bx 3 Then rete nother new node to store the rest of the newly inserted suffix. bx 1 2

How to grow suffix tree (nïve method) Sme thing hppens when inserting bx 0 After inserting, nd, the suffix tree is omplete Finlly, in both ses, new lef is reted,numbered i. x bx 3 1 4 5 bx 6 2

PWM/PSSM using suffix tree Suppose we hve } How n suffix tree elerte the proess of mthing? (1) We first find the proper length of trget sequene segment. The length n be deided bsed on memory size. (2) Then we onstrut suffix trees from the trget sequene.

PWM/PSSM using suffix tree (3) Then depth-first trversl of the tree is performed, lulted ll the prefix sores ( pfxs d ) for edge lbels. Suppose we hve the sore funtions like the following: S 1, S 3 S, 3 S,, = 2, 0 =, 1 =, 0 = for given threshold: th = 6 We hve intermedite thresholds: th 0 =3, th 1 =6 Afterwrds, we lulte ll the prefix sores for edge lbels. 1 1 4 3 6 3 5

PWM/PSSM using suffix tree Red zone in the figure shows the brnhes hving sore below intermedite threshold (4) Finlly nlyze the sores, hek if either of the two ses hppens: i. Any sore t some node in the tree rehes the threshold, then ll of its substrings represented by tree rehes the threshold s well. ii. Similrly, hek if ny of the sores fll below the intermedite threshold, then the whole substring brnh n be ignored. 1 3 Green zone in the figure shows the brnhes hving sores bove intermedite threshold 6 4 3 5

Suffix tree à Suffix rry 41 7/20/17

Enhned suffix rry } Min fetures: } M. Bekstette et l. (2006) brought forwrd PWM-bsed serhing method using enhned suffix rrys. } In their study, they foused on the improvement of spe effiieny when serhing with PWM. Their method is similr to the suffix tree disussed in the previous slides. Three rrys re kept for different usges: 1. suf rry suf rry speifies the first indies of eh suffix. 2. lp rry lp rry stores the length of the longest ommon prefix of two djent suffixes ording to lef numbers. 3. skp rry Sorry, little bit omplex, tlk bout it in the following slides.

Enhned suffix rry (rry suf ) suf rry speifies the first indies of eh suffix.. S suf [0], S suf [1],, S suf [n-1] is the sequene of suffixes of S in first index position sending order, where S suf [i]=s suf[i] = [i... n-1]. i à index if ordered lexiogrphilly i suf[i] S suf [i] 6 0 0 1 1 2 2 3 4 4 9 5 7 6 3 7 8 8 5 9 10 10 11 11

Enhned suffix rry (rry lp ) Arry lp is n rry rnge from 0 to n with the following fetures. (1) lp[0] = 0 (2) lp[i] stores the length of the longest ommon prefix of S suf [i- 1] nd S suf [i]. The ommon prefix of nd is, so lp[1] = 3 The ommon prefix of nd is, so lp[3] = 1 i lp[i] S suf [i] 0 0 1 3 2 2 3 1 4 2 5 2 6 0 7 2 8 3 9 1 10 1 11 0

Enhned suffix rry (rry skp ) Arry skp is in rnge 0 to n suh tht skp [ i] = min({ n + 1} È{ j Î[ i + 1, n] lp[ i] > lp[ j]}) Geometrilly, skp[i] denotes the next lef tht does not our in substree below the brnhing node orresponding to the longest ommon prefix of S suf [i-1] nd S suf [i]. à skp[i] is the next index j where where lp[j] < lp[i] i lp[i] skp[i] S suf [i] 0 0 12 1 3 2 2 2 3 3 1 6 4 2 6 5 2 6 6 0 12 7 2 9 8 3 9 9 1 11 10 1 11 11 0 12

Enhned suffix rry (rry skp ) Longest ommon prefix of nd is, so lp[3] = 1. 2 0 1 The red edge indites the ommon prefix. 7 8 3 4 5 6 9 10 11

Enhned suffix rry (rry skp ) 0 Similrly, we n find out tht lp[4] = lp[5] = 2 The red edge indites the ommon prefix. 7 8 2 3 4 5 6 1 9 10 11

Enhned suffix rry (rry skp ) We nnot find ommon prefix between nd, so lp[6]= 0 2 0 1 Therefore, skp[3] = skp[4] = skp[5]= 6. In the grph, we n esily tell S suf [6] is the first node (olored in green) not ourring in brnh of S suf [3], S suf [4] nd S suf [5] (olored in purple). 9 10 7 8 3 4 5 6 11

Enhned suffix rry (rry skp ) Strting from no node ours in nother brnh (brnh not involved with the urrent suffix). Therefore, skp[6]=12 7 8 2 3 4 5 6 0 1 9 10 11

Referenes } A.E. Kel et l. MATCHTM: tool for serhing trnsription ftor binding sites in DNA sequenes. (2003) Nulei Aids Reserh Vol. 31 No. 13 } M. Bekstette et l. PoSSuMserh: Fst nd Sensitive Mthing of Position Speifi Soring Mtries using Enhnes Suffix Arrys (2004) } M. Bekstette et l. Fst Index bsed lgorithms nd softwre for mthing position speifi soring mtries. (2006) BMC Bioinformtis } S. Rhmnn et l. On the Power of Profiles for Trnsription Ftor Biding Site Detetion. (2003) Sttistil Applitions in Genetis nd Moleulr Biology } B. Dorohonenu et l. Aelerting Protein Clssifition Using Suffix Trees. (2000)

Thnk you! 51 7/20/17

Suffix rrys/trees nd PWM mthing 52 7/20/17

} Definition (1): prefix sore for sequene w pfxs d d w) = åln( f w ( j), j / w( j) ) j= 0 Enhned suffix rry ( p w ( j) Î{ A, T, G, C} for ll j where w is sequene segment, w(j) is the hrter of w t index j. Denote l i = min{m, S suf [i] }-1. } Definition (2): d i s the lrgest depth of the suffix tht stisfies the intermedite threshold d = mx({ -1} È{ d Î[0, l ] pfxs ( S [ i]) ³ i i d suf th d }) } Definition (3): C i [d] is the prefix sore of S suf [i] with depth d Ci [ d] = pfxs d ( Ssuf [ i]) for ll d Î[0,di ]

Enhned suffix rry Notie tht, for eh S suf [i], the following sttements re equivlent: d i = M 1 pfxs M 1 (S suf [i]) = C i [M 1] th M 1 M is the length of PWM We will show the lgorithm by n exmple. Suppose we hve following sore funtions : S i,j Index 0 Index 1 1 3 2 3 2 1 Index 2 Tht is, S suf [i] stisfies the threshold iff the lrgest depth stisfying the intermedite threshold equls to the length of the PWM Suppose we hve following threshold: th = 7 Intermedite thresholds: th 0 = 2, th 1 = 5, th 2 = 7.

Algorithm: 1. First ompute C 0 nd d 0 to see if the first suffix stisfies the threshold For the S suf [0] =, we hve C 0 [0] = pfxs 0 (S suf [0]) = 1, below the threshold. d = mx({ -1} È{ d Î[0, l0] pfxs d ( Ssuf [0]) ³ th 0 d Hene we hve d 0 = -1, mening no prefix stisfies threshold. Enhned suffix rry Below th 0 }) 11 9 10 7 8 2 3 4 5 6 0 1

Enhned suffix rry By following the rules below: 2. Afterwrds, it s the VERY triky prt. Bsed on the skp rry, we n utlly JUMP over some suffixes. For eh S suf [i] stisfying/not stisfying the threshold, we try to find the first k tht d i +1 >= lp[k], by the following jumping sde: let k 0 = i+1, k 1 = skp[k 0 ], k m = skp[k m-1 ] suh tht, d i +1 < lp[k 1 ], d i +1 < lp[k 2 ],, d i +1 < lp[k m-1 ] nd d i +1>= lp[k m ] k m is the k we wnt. And ny suffixes within the jump rnge stisfy/do not stisfy the threshold s S suf [i] stisfies/does not stisfy the threshold

Enhned suffix rry i. In the first step, we hve d 0 = -1 ii. We try to find first k suh tht d 0 +1=0 >= lp[k]. iii. By mking three jumps bsed on skp rry, we find k 3 = 6 stisfying our se. First jump: k 1 = skp[k 0 =0+1=1] = 2 d 0 +1=0< lp[k 1 ] = 2 Seond jump: k 2 = skp[k 1 ] = 3 d 0 +1=0< lp[k 2 ] = 1 Third jump: k 3 = skp[k 2 ] = 6 d 0 +1=0>= lp[k 3 ] = 0. YEAH, we got it!!! i lp[i] skp[i] S suf [i] 0 0 12 1 3 2 2 2 3 3 1 6 4 2 6 5 2 6 6 0 12 7 2 9 8 3 9 9 1 11 10 1 11 11 0 12

Below th 0 1 Enhned suffix rry Sine S suf [0] does not stisfy the threshold, S suf [1] S suf [5] nnot stisfy the threshold. i. In the first step, we hve d 0 = -1 ii. We try to find first k suh tht d 0 +1 >= lp[k]. iii. By mking three jumps bsed on skp rry, we find k 3 = 6 stisfying our se. First jump: k 1 = skp[k 0 ] = 2 d 0 +1=0< lp[k 1 ] = 2 Seond jump: k 2 = skp[k 1 ] = 3 d 0 +1=0< lp[k 2 ] = 1 Third jump: k 3 = skp[k 2 ] = 6 d 0 +1=0>= lp[k 3 ] = 0. YEAH, we got it!!!

Next we ompute C 6, d 6, Enhned suffix rry d = mx({ -1} È{ d Î[0, l ] pfxs ( S [ i]) ³ i i d suf th d }) Ci [ d] = pfxs d ( Ssuf [ i]) for ll d Î[0,di ] We obtin: d 6 = 2 nd C 6 [0] = 3, C 6 [1] = 6, C 6 [2] = 8, stisfying ll intermedite thresholds. Therefore, S suf [6] is signl. S i,j Index 0 Index 1 1 3 2 3 2 1 Index 2 S suf [6]= Suppose we hve following threshold: th = 7 Intermedite thresholds: th 0 = 2, th 1 = 5, th 2 = 7.

Enhned suffix rry Similrly, we try to find the first k suh tht d 6 +1=3 >= lp[k]. We find tht, k 0 =6+1=7 6 3 8 Stisfying d 6 +1=2+1=3 >= lp[k 0 =7] = 2 Therefore, only S suf [6] stisfies the threshold in this round. Next we ontinue to ompute C 7 nd d 7 No JUMP here. Only moves to the next node

Enhned suffix rry 6 7 3 By similr pproh, we obtin d 7 = 2 nd C 7 [0] = 3, C 7 [1] = 6, C 7 [2] = 7, stisfying ll intermedite threshold. Similrly, we try to find the first k suh tht d 7 +1=3 >= lp[k]. We find tht, k 0 =7+1=8 Stisfiying d 7 +1=2+1=3 >= lp[k 0 =8] = 3 Therefore, only S suf [7] stisfies the threshold in this round. No JUMP here. Only moves to the next node

Enhned suffix rry 0 1 3 6 7 11 5 6 9 10 7 8 2 3 4 5 6 By similr pproh, we obtin S suf [8],S suf [9] stisfying the threshold; S suf [10] nd S suf [11] not stisfying the threshold (Algorithm ends)

Enhned suffix rry (lgorithm) 1. Compute d 0, nd C 0 [d] for ny d Î[ 0, d0] 2. Assume d i-1 nd C i-1 [d] hs been determined, then we lulte d i nd C i [d] from d i-1 nd C i-1 [d] : Sine S suf [i-1] nd S suf [i] hve ommon prefix of length lp[i], we hve, C i-1 [d]= C i [d] for ll d Î[ 0, lp[ i] -1] To lulte C i [d] for ll d Î[ 0, di] onsidered:, the following two ses need to (1) d i-1 +1 >= lp[i] Then ompute C i [d] for d i+1 >lp[i] while d<=l i nd C i [d] >= th d

Enhned suffix rry (lgorithm) (2) d i-1 +1< lp[i] Suppose we hve j be the minimum vlue from [i+1, n+1] suh tht ll suffixes S suf [i], S suf [i+1] S suf [j-1] hve ommon prefix of length d i-1 +1. Then, ording to the definition, i. if d i-1 = m-1, then there re signls t ll position S suf [r] for i<=r<=j-1 ii. If d i-1 <m-1, then no signls for ll position S suf [r] We obtin j by following hin of entries in rry skp, omputing hin of vlues : j 0 =i, j 1 = skp[j 0 ], j k = skp[j k-1 ] suh tht, d i-1 +1 < lp[j k-1 ] nd d i-1 +1>= lp[j k ]