Approximate Abundance Histograms and Their Use for Genome Size Estimation
|
|
- Kerry Johns
- 5 years ago
- Views:
Transcription
1 J. Hlaváčová (Ed.): ITAT 2017 Poceedngs, pp CEUR Wokshop Poceedngs Vol. 1885, ISSN , c 2017 M. Lpovský, T. Vnař, B. Bejová Appoxmate Abundance Hstogams and The Use fo Genome Sze Estmaton Máo Lpovský, Tomáš Vnař, Boňa Bejová Faculty of Mathematcs, Physcs, and Infomatcs, Comenus Unvesty, Mlynská dolna, Batslava, Slovaka Abstact: DNA sequencng data s typcally a lage collecton of shot stngs called eads. We can summaze such data by computng a hstogam of the numbe of occuences of substngs of a fxed length. Such hstogams can be used fo example to estmate the sze of a genome. In ths pape, we study a ecent tool, Kmelght, whch computes appoxmate hstogams. We dscove an appoxmaton bas, and we popose a new, unbased veson of Kmelght. We also model the dstbuton of appoxmaton eos and suppot ou theoetcal model by expemental data. Fnally, we use anothe tool, CovEst, to compute genome sze estmates fom appoxmate hstogams. Ou esults show that although CovEst was desgned to wok wth exact hstogams, t can be used wth the appoxmate vesons, whch can be poduced n a much smalle memoy. 1 Intoducton A genome s a collecton of DNA molecules stong genetc nfomaton n a cell. It can be epesented as a set of long stngs ove the alphabet Σ = {A,C,G,T } (each stng coespondng to one chomosome). In a DNA sequencng expement, many eads ae poduced fom a genome. These eads ae shot substngs obtaned fom andom locatons of the genome, potentally wth some sequencng eos. Fo a genome to be analyzed, the DNA sequence of ndvdual chomosomes must be fst assembled fom the eads, but the pocess of genome assembly s computatonally demandng and eo-pone. Howeve, t s possble to estmate the genome sze and some othe chaactestcs wthout the need of genome assembly [2, 9, 5, 8]. These estmates ae computed fom a summay statstc of eads called the k-me abundance hstogam. A k-me s a substng of length exactly k and the hstogam summazes the numbe of occuences of ndvdual k-mes n the nput set of eads. Most of the hstogam computng methods count the occuences of each k-me n the nput data usng hash tables o suffx aays [6, 4, 3]. Consequently, the memoy usage nceases at least lnealy wth the numbe of pocessed dstnct k-mes. To educe the memoy equements, k-me abundance hstogams can be computed appoxmately. One of the newest such algothms, Kmelght [8], combnes the technques of samplng and hashng to mantan a sketch of k-mes and fom the contents of the sketch computes an estmate of the hstogam. The appoxmate hstogams wee aleady used as an nput fo genome sze estmaton [8], howeve the mpact of the appoxmaton eos on the accuacy of the esultng estmates was not evaluated. In ths pape, we study the chaacte of eos of the Kmelght s hstogams and the nfluence on the subsequent genome sze estmaton. We stat wth an empcal study of the appoxmaton eos of Kmelght algothm, and we dscove that Kmelght poduces systematcally based estmates of some hstogams. We explan the souce of the bas and popose an unbased modfcaton of Kmelght. Next we model the dstbuton of Kmelght s eos wth the nomal dstbuton, and we popose a fomula that descbes Kmelght s vaance. We then expementally test ou theoetcal model and exploe ts lmtatons. Fnally, we use CovEst softwae [2] to estmate the szes of smulated genomes fom appoxmate hstogams poduced by Kmelght. We descbe how dffeent popetes of the nput nfluence the accuacy of the estmates, and we compae the estmates based on exact hstogams to the estmates based on appoxmate hstogams. 2 Eos n Kmelght s Appoxmate Abundance Hstogams A k-me s a substng of length exactly k. A ead of length thus contans k + 1 k-mes. If a k-me occus tmes among all nput eads, we say ts abundance s. In ths pape, we use the value k = 21 as n pevous woks [9, 2]. Defnton 1. The k-me abundance hstogam s a sequence f = f 1, f 2,... f m, whee f s the numbe of k-mes that occu n the nput set exactly tmes, and m s the maxmum obseved abundance. Countng the abundance of each ndvdual k-me appeang n a lage nput fle eques lage memoy. As we ae only nteested n the hstogam, the poblem of k- me countng can be avoded, allowng us to estmate the hstogam moe effcently. We can educe the amount of equed memoy fom dozens of ggabytes to hundeds of megabytes, allowng these computatons to be pefomed on a pesonal compute athe than on a cluste. We wll consde steamng algothms, whch n geneal pocess a sequence of tems (n ou context k-mes) n a sngle pass usng only a lmted amount of memoy and tme. These algothms mantan an appoxmate summay, o a sketch, of the pevously vewed k-mes and
2 28 M. Lpovský, T. Vnař, B. Bejová wth each new k-me the sketch s updated. When all the k-mes ae pocessed, the sketch can be analyzed to povde the estmate of the k-me abundance hstogam. In 2002, Ba-Yossef et al. pesented thee algothms fo estmatng the numbe of dstnct elements n a steam (F 0 = m =1 f ) wth theoetcal guaantees [1]. In 2014, Melsted and Halldósson [5] mplemented and extended Algothm 2 fom the afoementoned pape [1] and used t fo k-me countng. The algothm KmeSteam can also estmate f 1, the numbe of k-mes wth abundance one. KmeSteam was futhe mpoved by Svadasan et al. n 2016 [8], and the softwae Kmelght estmates the whole hstogam ( f 1,..., f m ). As we focus on Kmelght n ou wok, we wll descbe t n detal. 2.1 Kmelght Kmelght mantans a sketch of pocessed k-mes. Its output ae estmates ˆF 0, ˆf 1,..., ˆf m obtaned fom the fnal content of the sketch. The scheme uses paametes W,, u and two hash functons g : Σ k {0,...,2 W 1} and h : Σ k {0,..., 1} {0,...,u 1}. In the analyss, we wll assume that g and h hash each stng fom Σ k unfomly and ndependently ove the ange of values. The sketch conssts of W levels. Each level w has a hash table wth countes T w [0],...,T w [ 1]. Each counte stoes ts value T w [c].v (the abundance of a patcula k- me) and an auxlay nfomaton T w [c].p {0,...,u 1}. To pocess an nput k-me x we fst select ts level w as the numbe of talng zeoes n the bnay epesentaton of g(x). As a esult, the pobablty of selectng level w s 1/2 w. Next, usng has functon h, the k-me s hashed nto a pa (c, j) and pocessed as follows: If counte T w [c] s empty, ts value s nceased to 1 and j s stoed as an auxlay nfomaton T w [c].p. If T w [c] s not empty, and T w [c].p equals j, the counte value T w [c].v s ncemented. Fnally, f T w [c].p j, the counte s maked as dty. Dty countes ae not modfed by futue updates. Note that all occuences of the same k-me wll be stoed n the same counte at the same level. The value of ths counte should coespond to the abundance of ths k-me. Snce two o moe dffeent k-mes may hash nto the same counte, collsons may occu. The auxlay nfomaton helps to detect some of these collsons. Estmato of F 0. Snce on aveage F 0 /2 w dstnct k-mes ae hashed nto level w, the pobablty that a counte at ths level emans empty s appoxmately p = (1 1 ) F 0/2 w. In ths estmate and n all subsequent analyses, we assume that the numbe of dstnct k-mes at level w s exactly F 0 /2 w, although n fact t s a bnomal andom vaable wth ths numbe as the mean. The expected numbe of empty countes at level w s thus p. Let us denote the numbe of obseved empty countes at level w as t (w) 0. Usng the assumpton t (w) 0 p we can deve the estmato of F 0 : ˆF 0 = 2 w ln(t(w) 0 /) ln ( 1 1 ). (1) To estmate the numbe of dstnct k-mes F 0 usng ths fomula, we choose level w = w, so that the numbe of empty countes t (w) 0 at ths level s closest to /2. It has been shown that selectng ths level povdes a bounded eo of ˆF 0 wth a guaanteed pobablty [1]. Estmato of f. The expected numbe of dstnct k-mes wth abundance hashed to level w s f /2 w. When a k-me s hashed nto level w, the pobablty that t s stoed n a collson-fee counte s (1 1 )F 0/2 w 1, whch s the pobablty that no othe k-me fom level w wll get hashed nto the same counte. Thus we can estmate the numbe of collson-fee countes wth value as ( f /2 w 1 1 ) F0 /2 w 1 (2) If we denote the numbe of obseved collson-fee countes wth value as t (w), we can deve the estmato of f : ( ˆf = t (w) 2 w 1 1 ) 1 F0 /2 w (3) To estmate f, the algothm selects level w = w so that t maxmzes t (w) the numbe of obseved collson-fee countes wth value. Undetected collsons. The value t (w) s based on the numbe of non-dty countes wth value, but these nclude both tue postves (collson-fee countes) and false postves (countes wth undetected collsons). The expected numbe of false postve at one level s at most /u, whee 0,...u 1 s the ange of auxlay nfomaton [8]. Paamete u can be set to make false postves neglgble; thus we wll assume that all collsons ae beng detected. Medan amplfcaton. To decease the vaance of the estmates and to use of multpocessng, t ndependent nstances of Kmelght s sketch ae un concuently. Estmate ˆF 0 s selected as the medan of ˆF (1) 0,..., ˆF (t) 0, and fnal estmates of f ae also the medans of t nstances. Accuacy and complexty. Paametes, u ant t povde a tade-off between the memoy and accuacy. Assumng that W = Θ(logF 0 ), the algothm uses O(t log(f 0 )) memoy wods, and pocessng of one k-me eques O(t) tme. The authos have shown that the algothm computes estmates ˆF 0 and ˆf fo suffcently lage f ( f F 0 /λ) wth a bounded elatve eo (1 ε)f 0 ˆF 0 (1 + ε)f 0, (1 ε) f ˆf (1 + ε) f wth pobablty at least 1 δ, when the paametes ae set as follows: t = O(log(λ/δ)), = O( λ ) and u = O( λf ε 2 0 ). Due to loose ε 2
3 Appoxmate Abundance Hstogams and The Use fo Genome Sze Estmaton 29 eos ( ˆf f )/ f and the vaance ncease wth deceasng f (data not shown). Kmelght guaantees bounded eos fo hgh values of f, but a theoetcal quanttatve analyss of the eo dstbuton was not pesented n the pevous wok [8]. We povde a quanttatve estmate of the vaance n Secton 2.4. Fgue 1: The exact values f poduced by Jellyfsh (blue) and appoxmate values ˆf poduced by Kmelght aveaged fom 50 tals (oange). The eo bas expess the standad devaton of ˆf. Columns fo = 1,2 ae omtted, havng values appoxmately 10 7,10 6 espectvely. constants, these asymptotc estmates cannot be used to set paametes fo obtanng desable eo bounds. The accuacy of the algothm was tested expementally wth abtay paametes W = 64,t = 7, {2 16,2 18 },u = Empcal Study of Appoxmaton Eos To study the chaacte of appoxmaton eos, we stat wth seveal obsevatons on geneated data. We have fst geneated a genome: a andom sequence g 1...g L consstng of L chaactes A,C,G,T, each wth pobablty 1/4. Next we have geneated c L/l eads of length l = 100, whee c s a paamete descbng the depth of coveage of the genome by eads. Fo each ead, we have unfomly selected a andom statng poston s {1,...,L l + 1}. The ead then conssts of chaactes g s g s+1...g s+l 1. Fnally we have smulated sequencng eos by modfyng each base andomly wth pobablty e. Ou ntal expement has used a sngle data set wth paametes L = 10 6, c = 50, and e = Fo ths data set, Fgue 1 shows the exact k-me abundance hstogam (k = 21) poduced by Jellyfsh [4] and the means and standad devatons of the estmates ˆf fom 50 Kmelght uns on the same nput. Kmelght was un wth paametes W = 64, t = 7, = 2 15, u = Pehaps the most stkng featue of ths hstogam s that Kmelght systematcally oveestmates values of f. In Secton 2.3, we clafy the souce of ths bas and pesent an unbased estmato. Kmelght guaantees pecse estmates of those values of f that ae close to F 0. Unfotunately, n sequencng data, sequencng eos often ceate many unque k-mes, and thus we have f 1 close to F 0, whle the emanng f ae much smalle. In the exteme case of vey low values of f, the pobablty that at least one k-me wth abundance becomes stoed n a collson-fee counte at any level appoaches to zeo. If no k-me suvves the collsons (t (w) = 0) then ˆf = 0. In ou expement, Kmelght poduced zeo estmates fo f < 500. Fgue 1 shows that the absolute eo ( ˆf f ) and vaance of ˆf decease wth deceasng f. Howeve, elatve 2.3 Kmelght Appoxmaton Bas and Unbased Veson As we wll explan n ths secton, the bas of Kmelght estmates towads hghe values s caused by ts selecton of level w used to calculate ˆf. Level w s chosen to maxmze t (w) the numbe of collson-fee countes wth value at level w. We beleve that the authos hoped to mnmze the vaance of the estmato ˆf by ncludng as many k-mes n the estmate as possble. Analytcal w +. To undestand whch levels w ae selected by Kmelght, we wll analytcally fnd the level w + that maxmzes E(t (w) ) the expected numbe of collson-fee countes wth value. As shown n equaton (2), E(t (w) ) = f /2 w ( ) 1 1 F0 /2 w 1. The value of w + at whch E(t (w) ) s maxmzed can be obtaned fom nequaltes E(t (w+ 1) ) E(t (w+ ) ) E(t (w+ +1) ). By manpulatng the nequaltes we get ( ) F0 /2 w+ and fnally, we can calculate w + : 1 2, (4) lgf 0 + lglg 1 1 w+ lgf 0 + lglg 1. Symbol lg denotes the bnay logathm. Note these nequaltes have at most two ntege solutons. The choce of level w + does not depend on values of o f, but only on F 0 and. Snce w + 1 = w+ 2 = = w+ m, fom now on we wll efe to ths level smply by w +. Ths level also maxmzes the expected numbe of all non-empty collson-fee countes E(t (w) ), whee t (w) = m =1 t(w). Explanaton of bas. The numbe of collson-fee countes at levels w + 1 and w s typcally smla to the numbe of collson-fee countes at level w +. Thee ae two tmes moe k-mes hashed nto level w + 1 than nto w +, but also moe collsons happen at w + 1. These two effects patally cancel each othe and mantan smla values of E(t (w) ) fo w = w + 1,w +,w As a esult, any of these levels can hold the maxmal t (w) and become chosen by Kmelght as ts w. Ths typcally leads to values of t (w ) hghe than the expected value E(t (w) ) fo fxed level w = w. To obtan ˆf, value t (w ) s multpled by
4 30 M. Lpovský, T. Vnař, B. Bejová Fgue 2: Each column shows up to seven levels w that wee selected by one of t = 7 nstances of Kmelght n the expement fom Secton 2.2. If a level was chosen by moe nstances, ts ccle s dake. Note that f no countes hold value n the whole sketch, Kmelght s nstance does not choose any level. Fgue 4: Mean (top) and standad devaton (bottom) of estmate eos ( ˆf f ) poduced by the ognal (blue) and modfed (oange) Kmelght n 300 tals. Fgue 3: Values p e, p c f, p c epesent the theoetcal factons of empty, collson-fee and collded countes at each level espectvely. To make the values compaable to the esults fom Secton 2.2, we set F 0 = , = ) 1 F0 /2 w. Thus based t (w ) 2 w (1 1 also leads to based estmato ˆf. To llustate ths, Fgue 2 shows values w fo dffeent f extacted fom one Kmelght un. The analytcal level w + = 9 s selected most fequently, but Kmelght often chooses level 8 o 10 to maxmze t (w). Note that fo 3 15 and 40, the values of f ae low, and thus t (w) have a hgh elatve vaance. The maxmal t (w) can thus be eached at levels moe dstant fom w + = 9. In ode to futhe demonstate the smlaty of E(t (w) ) at the levels close to w +, Fgue 3 dsplays the theoetcal factons of empty, collded, and non-empty collson-fee countes at dffeent levels of the sketch. Focusng on a sngle level w, let us denote as p c f the pobablty that a sngle counte s non-empty and collson-fee, p e the pobablty that ths counte s empty, and p c the pobablty that t holds a collson. The numbe of dstnct k-mes hashed nto one counte (X) follows a bnomal dstbuton, X Bn(F 0 /2 w,1/), and we can use ths dstbuton to compute p c f = P[X = 1], p e = P[X = 0], and p c = P[X > 1]. Note that the expected numbe of non-empty collson-fee countes at one level s E(t (w) ) = p c f. The pesented settngs ncdentally epesent an un- favoable stuaton fo Kmelght, because E(t (8) ) E(t (9) ). If thee was a geate dffeence between E(t (w+) ), E(t (w+ 1) ) and E(t (w+ +1) ), Kmelght would choose the level w + much moe often than the othe levels and so t would have a smalle chance of choosng t (w ) > E(t (w+) ). Removng bas fom estmates of f. Ou obsevaton suggests a smple modfcaton to emove the bas fom estmates of f. In patcula, we wll use the level w + that maxmzes the expected numbe of collson-fee countes E(t (w) ), nstead of the level w that maxmzes the obseved numbe of collson-fee countes t (w). The modfed Kmelght ceates sketches and estmates F 0 n the same way as the ognal algothm. Then usng ˆF 0 and, t calculates w +, as dscussed above. Fnally, values of f ae estmated fom the obseved counts of collson-fee countes at level w + by equaton (3). Note that we do not pove analytcally that ths estmato s unbased. Smplfcatons n ou analyss may cause some small bases n the estmato, but we demonstate expementally that the estmato woks well on ou geneated data. In Fgue 4, we compae the ognal and modfed Kmelght n 300 tals of both vesons on the same data as n Secton 2.2. Whle the ognal Kmelght oveestmates f sgnfcantly, the modfed Kmelght acheves E( ˆf ) f, wthout much change n the vaance. 2.4 Evaluaton of Appoxmaton Vaance In ths secton, we estmate the vaance of ˆf. We wll consde estmates obtaned at a fxed level w (fo example
5 Appoxmate Abundance Hstogams and The Use fo Genome Sze Estmaton 31 w = w + ). Recall that E(t (w) ) = f /2 w (1 1/) F 0/2 w 1. We wll consde t (w) to follow a bnomal dstbuton Bn( f, p s ), whee p s = 1/2 w (1 1/) F 0/2 w 1. Ths smplfcaton coesponds to a smple samplng pocess n whch we sample each of f k-mes wth pobablty p s, and we dscad each k-me wth pobablty 1 p s. Note that ths appoach assumes that each k-me emans collson fee ndependently of othes, whch s not the case. Snce a andom vaable followng Bn(n, p) has vaance of np(1 p), Va(t (w) ) = f p s (1 p s ). The estmate of f s obtaned as t /p s, so ( ) t Va( ˆf ) = Va = 1 p s p 2 Va(t ) = f (1 p s ) (5) s p s Effect of medans. Kmelght chooses the estmate ˆf as a medan of estmates of t ndependent sketches: ˆf = med( ˆf (1),... ˆf (t) ). Fo a contnuous andom vaable wth a densty functon f (x) and mean x, ts sample medan fom a sample of sze n s asymptotcally 1 nomal: ( ) 1 med(x) N x, 4n f ( x) 2. (6) As the bnomal dstbuton of t (w) s a dscete dstbuton, we wll appoxmate Bn( f, p s ) wth N(µ = f p s,σ 2 = f p s (1 p s )). The densty functon of nomal 1 dstbuton n ts mean µ s. 2πσ 2 Usng appoxmatons (5), (6), vaance of ˆf selected as a medan of t nstances can be deved as follows: Va( ˆf ) 2π 4t Va( ˆf (l) ) = π 2t f (1 p s ) p s. (7) Expements. We an the modfed Kmelght n 300 tals on the data pesented n Secton 2.2. Fgue 5 shows hstogams of values of ˆf fo selected 2 values of. We compae these hstogams wth two nomal dstbutons. The best nomal ft s a Gaussan wth ts mean and standad devaton obtaned fom the obseved values of ˆf. The plotted theoetcal pedcton uses the exact value of f as ts mean and the vaance s calculated accodng to equaton (7) usng the exact values of f and F 0. The qualty of nomal appoxmaton lagely depends on the tue value of f. Accodng to ou appoxmaton, the expected numbe of countes wth value at level w + s E(t ) = f p s. Fo ths dataset, w + = 9, and thus the samplng pobablty p s s appoxmately 1/ /1000 when we use the uppe bound fom equaton (4). Fo the lowest values of f (.e. f 6, f 10 ), we have E(t ) < 1. So typcally no k-mes suvve the collsons at level 1 Even fo a sample of only 7 obsevatons dawn fom the nomal dstbuton, the elatve eo of ths appoxmaton s only about 6% [7]. 2 To select values = 6,10,13,16,32,23, we have soted the values f and selected each eghth value. We have also ncluded f 1, f 2, snce they dffe fom othe f by odes of magntude. Fgue 6: The blue and oange ponts show the standad devaton of elatve eo ( ˆf f )/ f of estmates ˆf computed by the ognal and modfed Kmelght espectvely n 300 uns on dataset fom Secton 2.2. The geen lne epesents the appoxmaton of standad devaton calculated usng equaton (7). w +, and thus t = 0 and ˆf = 0. Due to the use of medans, at least one k-me must suvve n at least fou nstances to poduce t 1. As a esult, we obtan ˆf = 0 n all tals even fo f 10 = 140. Snce the estmates ˆf ae always zeo, ou estmates of vaances fo these f ae vey mpecse. If the value f s such that E(t ) s aound 1 (.e. f 13, f 16 ), a vey small numbe of k-mes hash nto collson-fee countes at level w o w +. Theefoe the estmato ˆf can each only a lmted set of dscete values ( ˆf = 0 f no k- me suvves collsons, ˆf = 1/p s 1000 f one k-me s n collson-fee counte, ˆf = 2/p s 2000 f two k-mes suvve,... ), as t can be seen n Fgue 5. The appoxmaton wth nomal dstbuton s not pecse fo these values f, snce the dstbuton of ˆf s clealy dscete. Fnally, fo hghe f, whee E(t ) 1, the estmato ˆf takes on vaous values, and the appoxmaton wth nomal dstbuton seems easonable, as t can be seen fom the bottom ow of Fgue 5. We have also appled Kolmogoov-Smnov tests to test the nomalty of ˆf. These tests eject nomalty fo f < 20,000 wth ou dataset of 300 tals at the sgnfcance level of 5%. Oveall, we conclude that fo suffcently hgh values of f the dstbuton of ˆf can be appoxmated by Gaussan wth vaance calculated by (7). Fnally we pesent the compason of Kmelght s vaance and ts theoetcal pedcton n Fgue 6. In ths expement, ou estmate of the standad devaton of Kmelght s estmates has eo less than 5% even fo lowe f wth values aound /p s., whee the nomal appoxmaton was stll nadequate. The expement futhe suggests that an accuate estmate of vaance fo the lowest values of f < 100 would be zeo. Fgue 6 also eveals a dffeence n vaances between the ognal and modfed Kmelght s estmates fo f (100, 1000). The ognal Kmelght seaches fo a level w that maxmzes t (w) so even f only one k-me wth abundance suvves the collsons at any level of the sketch, Kmelght wll use t to estmate ˆf. We dd not nclude
6 32 M. Lpovský, T. Vnař, B. Bejová Fgue 5: Empcal pobablty denstes (blue hstogams), the best nomal fts (oange lnes) and theoetcal estmates (geen) of dstbutons of ˆf fo multple abundances. Hstogams come fom 300 uns of modfed Kmelght. Hozontal axes dsplay the values of estmates ˆf. ths effect nto the vaance estmaton, hence the estmates fo these f ae not pecse fo the ognal Kmelght. 2.5 Choce of Kmelght s Paametes Svadasan et al [8] povde theoetcal bounds on Kmelght s appoxmaton eos, but they do not povde methods fo settng paametes n pactcal scenaos. In ths secton, we show how to set the paametes n ode to acheve elatve eo bounded by ε wth pobablty 1 δ, unde the assumpton that ˆf s dstbuted accodng to the nomal appoxmaton deved n the pevous secton. We fst deve a two-sded pedcton nteval of ˆf. Let u ( ) α 2 be the ctcal value of the nomal dstbuton fo sgnfcance level α 2. By standadzaton we get ( ˆf f )/σ N(0,1), and so we have P [ ( α ) u ˆf f ( α ) ] u 1 α. 2 σ 2 If we set ε = u ( α 2 ) σ/ f, we obtan the bounds fo ˆf : P [ (1 ε) f ˆf (1 + ε) f ] 1 α. To acheve bounded eo wth pobablty at least 1 δ smultaneously fo m dffeent values of, we set α = δ m followng the Bonfeon coecton. Standad devaton σ can be calculated by the equaton (7) usng F 0, f,,t. Note that σ depends logathmcally on F 0 and thus a ough estmate of F 0 s suffcent. We estct the bounded pecson only fo suffcently lage f by settng f = F 0 /λ (λ = 1000 fo example). The estmates of lage f wll be even moe pecse. Fnally, by usng ε = u ( α 2 ) σ/ f, we can calculate eo bound ε fo a gven pobablty δ and paametes,t,λ. Ths allows us to effcently exploe vaous values of, and t and the nfluence on appoxmaton eos. Fo example, we can fnd paametes whch mnmze the appoxmaton eo fo a fxed memoy lmt. 3 Genome Sze Estmaton One motvaton fo computng k-me abundance hstogams fom sequencng data s to obtan genome sze estmates. Hee we use CovEst softwae [2] developed n ou goup to compae the accuacy of genome sze estmates poduced fom the exact and appoxmate hstogams. We use data geneated as descbed n Secton 2.2, but we vay paametes L (genome sze), c (genome coveage by eads), e (sequencng eo ate). Fo each set of genome paametes (c,e,l) we have geneated 50 data sets. Then we computed the exact hstogam usng Jellyfsh softwae [4] and two appoxmate hstogams usng the ognal and the modfed veson of Kmelght. Fnally we an CovEst on these thee hstogams usng the model wthout epeats, and we collected thee estmates of coveage ĉ j,ĉ ok,ĉ mk. In all expemental esults, we focus on the estmates of coveage ĉ. By dvdng the sum of lengths of all eads by ĉ, we can obtan estmates of genome sze ˆL. In the fst expement (top ow of Fgue 7) we nvestgate the effect of nceasng sequencng eo ates on the accuacy of coveage estmates. On exact hstogams, CovEst poduces unbased estmates of coveage wth the vaance nceasng wth eo ate. Estmates based on appoxmate hstogams ae clealy less accuate but stll acheve a elatvely good pecson. Mean eos ae consstently lowe fo modfed Kmelght than the eos of the ognal Kmelght. Pa Student s t-tests eject hypotheses mean(ĉ ok ĉ mk ) = 0 fo thee of fou pesented datasets, wth excepton of the dataset wth e = 0.05 whee the p-value s 0.3. Thus we conclude that the estmates based on modfed Kmelght s hstogams ae sgnfcantly bette than the estmates based on ognal Kmelght s hstogams. Wth modfed Kmelght, CovEst poduces estmates wth all eos bounded by 0.4%, 1%, 4%, 15% fo sequencng eo ates of 0,0.01,0.05,0.1 espectvely, and we consde these estmates suffcently accuate.
7 Appoxmate Abundance Hstogams and The Use fo Genome Sze Estmaton 33 Fgue 7: Dstbutons of coveage estmate eos ĉ c fo dffeent paametes. Each boxplot epesents 50 estmates of the coveage on dffeent geneated genomes. Blue, oange and geen boxes descbe the estmate eos wth Jellyfsh, ognal Kmelght and modfed Kmelght used to compute the k-me abundance hstogam. Note that vetcal axes use dffeent scales snce the values span acoss multple odes of magntude n dffeent datasets. As the default paametes, we use L = 10 6, c = 10 and e = In each expement, we alteed the value of one paamete, keepng the othe two fxed. Top ow: dffeent eo ates, mddle ow: dffeent coveage values, bottom ow: dffeent genome szes.
8 34 M. Lpovský, T. Vnař, B. Bejová In the second expement (mddle ow of Fgue 7), we study the accuacy fo dffeent genome coveage values. All CovEst estmates ĉ j n ou expement ange n 30%, 6%, 0.3%, < 0.1% ntevals aound the tue coveage fo coveages of 0.5, 2, 10 and 50. The vaance of ĉ j s compaable to vaance of ĉ ok,ĉ mk fo two lowe coveages, but wth nceasng coveage, estmates based on appoxmate hstogams become less accuate. Howeve, snce the maxmal eos each values of only 0.3, whch s less than 1% of the tue coveage c = 50, we also consde these estmates as useful. Modfed Kmelght sgnfcantly outpefoms the ognal Kmelght on all fou pesented datasets (vefed by t-tests). Fnally, wth nceasng genome sze (bottom ow of Fgue 7), CovEst s estmates become moe pecse on the exact hstogams, and estmates on appoxmated hstogams mantan a oughly constant pecson fo the examned genome szes. As the coveage estmate eos ae bounded by 1% fo all datasets, we also consde the appoxmate hstogams suffcently accuate fo the genome sze estmaton. Modfed Kmelght eaches smalle mean eo on the two smalle genomes than the ognal Kmelght. 4 Concluson In ths pape, we have studed the chaacte of appoxmaton eos n the k-me appoxmaton hstogams poduced by Kmelght [8]. We have dscoveed that Kmelght s estmates of ˆf ae based towads hghe values and povded a new estmate wthout ths bas. We have demonstated that fo suffcently lage values of f, the dstbuton of estmates can be well appoxmated by a nomal dstbuton. Ths appoxmaton can be used to tune the paametes of the method. We have also demonstated that appoxmate hstogams poduced by Kmelght ae suffcently accuate to poduce easonable estmates of genome sze, and that fo most settngs ou modfed veson of Kmelght leads to moe accuate genome sze estmaton than the ognal Kmelght. An mpotant avenue fo futhe eseach s to compae the accuacy CovEst esults on eal sequencng data. The use of appoxmate hstogams allows sgnfcant educton n memoy; fo example fo genome sze L = 10 8, coveage c = 50 and eo ate e = 0.1 (F 0 = ), Kmelght can poduce an appoxmate hstogam n only 83MB of memoy, wheeas Jellyfsh needs 34Gb. Note that ou modfed Kmelght uses only one of 64 Kmelght s levels to compute estmates fo all f, so t navely seems that we could mantan only ths level and educe the memoy consumpton by a facto of 64. Howeve, ou veson stll uses the full 64 levels to compute ˆF 0, whch s needed to fnd the desed level w +. We could ty to estmate ˆF 0 sepaately, patculaly, f two passes ove the data ae allowed. Fo example, we could oughly guess F 0 fom the sze of sequencng data and use fewe levels. It mght be also suffcent to estmate F 0 wth smalle hash tables. We dd not nque deepe nto ths topc, but we beleve that by such technques the memoy consumpton could be futhe deceased by a sgnfcant facto. Also note that the k-me abundance hstogam and many algothms used fo ts computaton can be genealzed to a hstogam of any nput tems. Thus the applcablty of ths topc goes beyond the feld of bonfomatcs. Acknowledgments. Ths eseach was funded by VEGA gants 1/0684/16 (BB) and 1/0719/14 (TV), and a gant fom the Slovak Reseach and Development Agency APVV Refeences [1] Z. Ba-Yossef, T.S. Jayam, R. Kuma, D. Svakuma, and L. Tevsan. Countng dstnct elements n a data steam. In Intenatonal Wokshop on Randomzaton and Appoxmaton Technques (RANDOM), pages Spnge, [2] M. Hozza, T. Vnař, and B. Bejová. How Bg s That Genome? Estmatng Genome Sze and Coveage fom k-me Abundance Specta. In Stng Pocessng and Infomaton Reteval (SPIRE), pages Spnge, [3] S. Kutz, A. Naechana, J. C. Sten, and D. Wae. A new method to compute k-me fequences and ts applcaton to annotate lage epettve plant genomes. BMC Genomcs, 9(1):517, [4] G. Maças and C. Kngsfod. A fast, lock-fee appoach fo effcent paallel countng of occuences of k-mes. Bonfomatcs, 27(6):764, [5] P. Melsted and B. V. Halldósson. Kmesteam: steamng algothms fo k-me abundance estmaton. Bonfomatcs, 30(24): , [6] P. Melsted and J. Ptchad. Effcent countng of k- mes n DNA sequences usng a bloom flte. BMC Bonfomatcs, 12(1):333, [7] P. R. Rde. Vaance of the medan of small samples fom seveal specal populatons. Jounal of the Amecan Statstcal Assocaton, 55(289): , [8] N. Svadasan, R. Snvasan, and K. Goyal. Kmelght: fast and accuate k-me abundance estmaton. CoRR, abs/ , [9] D. Wllams, W. L. Tmble, M. Shlts, F. Meye, and H. Ochman. Rapd quantfcaton of sequence epeats to esolve the sze, stuctue and contents of bacteal genomes. BMC Genomcs, 14(1):537, 2013.
Multistage Median Ranked Set Sampling for Estimating the Population Median
Jounal of Mathematcs and Statstcs 3 (: 58-64 007 ISSN 549-3644 007 Scence Publcatons Multstage Medan Ranked Set Samplng fo Estmatng the Populaton Medan Abdul Azz Jeman Ame Al-Oma and Kamaulzaman Ibahm
More informationP 365. r r r )...(1 365
SCIENCE WORLD JOURNAL VOL (NO4) 008 www.scecncewoldounal.og ISSN 597-64 SHORT COMMUNICATION ANALYSING THE APPROXIMATION MODEL TO BIRTHDAY PROBLEM *CHOJI, D.N. & DEME, A.C. Depatment of Mathematcs Unvesty
More informationA Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions
A Bef Gude to Recognzng and Copng Wth Falues of the Classcal Regesson Assumptons Model: Y 1 k X 1 X fxed n epeated samples IID 0, I. Specfcaton Poblems A. Unnecessay explanatoy vaables 1. OLS s no longe
More informationThe Greatest Deviation Correlation Coefficient and its Geometrical Interpretation
By Rudy A. Gdeon The Unvesty of Montana The Geatest Devaton Coelaton Coeffcent and ts Geometcal Intepetaton The Geatest Devaton Coelaton Coeffcent (GDCC) was ntoduced by Gdeon and Hollste (987). The GDCC
More informationA. Thicknesses and Densities
10 Lab0 The Eath s Shells A. Thcknesses and Denstes Any theoy of the nteo of the Eath must be consstent wth the fact that ts aggegate densty s 5.5 g/cm (ecall we calculated ths densty last tme). In othe
More information3. A Review of Some Existing AW (BT, CT) Algorithms
3. A Revew of Some Exstng AW (BT, CT) Algothms In ths secton, some typcal ant-wndp algothms wll be descbed. As the soltons fo bmpless and condtoned tansfe ae smla to those fo ant-wndp, the pesented algothms
More informationCorrespondence Analysis & Related Methods
Coespondence Analyss & Related Methods Ineta contbutons n weghted PCA PCA s a method of data vsualzaton whch epesents the tue postons of ponts n a map whch comes closest to all the ponts, closest n sense
More informationTian Zheng Department of Statistics Columbia University
Haplotype Tansmsson Assocaton (HTA) An "Impotance" Measue fo Selectng Genetc Makes Tan Zheng Depatment of Statstcs Columba Unvesty Ths s a jont wok wth Pofesso Shaw-Hwa Lo n the Depatment of Statstcs at
More informationPhysics 2A Chapter 11 - Universal Gravitation Fall 2017
Physcs A Chapte - Unvesal Gavtaton Fall 07 hese notes ae ve pages. A quck summay: he text boxes n the notes contan the esults that wll compse the toolbox o Chapte. hee ae thee sectons: the law o gavtaton,
More information8 Baire Category Theorem and Uniform Boundedness
8 Bae Categoy Theoem and Unfom Boundedness Pncple 8.1 Bae s Categoy Theoem Valdty of many esults n analyss depends on the completeness popety. Ths popety addesses the nadequacy of the system of atonal
More informationUNIT10 PLANE OF REGRESSION
UIT0 PLAE OF REGRESSIO Plane of Regesson Stuctue 0. Intoducton Ojectves 0. Yule s otaton 0. Plane of Regesson fo thee Vaales 0.4 Popetes of Resduals 0.5 Vaance of the Resduals 0.6 Summay 0.7 Solutons /
More informationGenerating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences
Geneatng Functons, Weghted and Non-Weghted Sums fo Powes of Second-Ode Recuence Sequences Pantelmon Stăncă Aubun Unvesty Montgomey, Depatment of Mathematcs Montgomey, AL 3614-403, USA e-mal: stanca@studel.aum.edu
More informationLearning the structure of Bayesian belief networks
Lectue 17 Leanng the stuctue of Bayesan belef netwoks Mlos Hauskecht mlos@cs.ptt.edu 5329 Sennott Squae Leanng of BBN Leanng. Leanng of paametes of condtonal pobabltes Leanng of the netwok stuctue Vaables:
More informationChapter Fifiteen. Surfaces Revisited
Chapte Ffteen ufaces Revsted 15.1 Vecto Descpton of ufaces We look now at the vey specal case of functons : D R 3, whee D R s a nce subset of the plane. We suppose s a nce functon. As the pont ( s, t)
More informationExact Simplification of Support Vector Solutions
Jounal of Machne Leanng Reseach 2 (200) 293-297 Submtted 3/0; Publshed 2/0 Exact Smplfcaton of Suppot Vecto Solutons Tom Downs TD@ITEE.UQ.EDU.AU School of Infomaton Technology and Electcal Engneeng Unvesty
More informationN = N t ; t 0. N is the number of claims paid by the
Iulan MICEA, Ph Mhaela COVIG, Ph Canddate epatment of Mathematcs The Buchaest Academy of Economc Studes an CECHIN-CISTA Uncedt Tac Bank, Lugoj SOME APPOXIMATIONS USE IN THE ISK POCESS OF INSUANCE COMPANY
More informationThermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering
Themodynamcs of solds 4. Statstcal themodynamcs and the 3 d law Kwangheon Pak Kyung Hee Unvesty Depatment of Nuclea Engneeng 4.1. Intoducton to statstcal themodynamcs Classcal themodynamcs Statstcal themodynamcs
More informationSet of square-integrable function 2 L : function space F
Set of squae-ntegable functon L : functon space F Motvaton: In ou pevous dscussons we have seen that fo fee patcles wave equatons (Helmholt o Schödnge) can be expessed n tems of egenvalue equatons. H E,
More informationDirichlet Mixture Priors: Inference and Adjustment
Dchlet Mxtue Pos: Infeence and Adustment Xugang Ye (Wokng wth Stephen Altschul and Y Kuo Yu) Natonal Cante fo Botechnology Infomaton Motvaton Real-wold obects Independent obsevatons Categocal data () (2)
More informationKhintchine-Type Inequalities and Their Applications in Optimization
Khntchne-Type Inequaltes and The Applcatons n Optmzaton Anthony Man-Cho So Depatment of Systems Engneeng & Engneeng Management The Chnese Unvesty of Hong Kong ISDS-Kolloquum Unvestaet Wen 29 June 2009
More informationDistinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c
nd Intenatonal Confeence on Electcal Compute Engneeng and Electoncs (ICECEE 15) Dstnct 8-QAM+ Pefect Aays Fanxn Zeng 1 a Zhenyu Zhang 1 b Lnje Qan 1 c 1 Chongqng Key Laboatoy of Emegency Communcaton Chongqng
More informationINTERVAL ESTIMATION FOR THE QUANTILE OF A TWO-PARAMETER EXPONENTIAL DISTRIBUTION
Intenatonal Jounal of Innovatve Management, Infomaton & Poducton ISME Intenatonalc0 ISSN 85-5439 Volume, Numbe, June 0 PP. 78-8 INTERVAL ESTIMATION FOR THE QUANTILE OF A TWO-PARAMETER EXPONENTIAL DISTRIBUTION
More informationVParC: A Compression Scheme for Numeric Data in Column-Oriented Databases
The Intenatonal Aab Jounal of Infomaton Technology VPaC: A Compesson Scheme fo Numec Data n Column-Oented Databases Ke Yan, Hong Zhu, and Kevn Lü School of Compute Scence and Technology, Huazhong Unvesty
More information4 Recursive Linear Predictor
4 Recusve Lnea Pedcto The man objectve of ths chapte s to desgn a lnea pedcto wthout havng a po knowledge about the coelaton popetes of the nput sgnal. In the conventonal lnea pedcto the known coelaton
More informationPARAMETER ESTIMATION FOR TWO WEIBULL POPULATIONS UNDER JOINT TYPE II CENSORED SCHEME
Sept 04 Vol 5 No 04 Intenatonal Jounal of Engneeng Appled Scences 0-04 EAAS & ARF All ghts eseed wwweaas-ounalog ISSN305-869 PARAMETER ESTIMATION FOR TWO WEIBULL POPULATIONS UNDER JOINT TYPE II CENSORED
More informationBayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems
Dept. of Math. Unvesty of Oslo Statstcal Reseach Repot No 3 ISSN 0806 3842 June 2010 Bayesan Assessment of Avalabltes and Unavalabltes of Multstate Monotone Systems Bent Natvg Jøund Gåsemy Tond Retan June
More informationAn Approach to Inverse Fuzzy Arithmetic
An Appoach to Invese Fuzzy Athmetc Mchael Hanss Insttute A of Mechancs, Unvesty of Stuttgat Stuttgat, Gemany mhanss@mechaun-stuttgatde Abstact A novel appoach of nvese fuzzy athmetc s ntoduced to successfully
More informationEvent Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15,
Event Shape Update A. Eveett A. Savn T. Doyle S. Hanlon I. Skllcon Event Shapes, A. Eveett, U. Wsconsn ZEUS Meetng, Octobe 15, 2003-1 Outlne Pogess of Event Shapes n DIS Smla to publshed pape: Powe Coecton
More informationScalars and Vectors Scalar
Scalas and ectos Scala A phscal quantt that s completel chaacteed b a eal numbe (o b ts numecal value) s called a scala. In othe wods a scala possesses onl a magntude. Mass denst volume tempeatue tme eneg
More informationPhysics 11b Lecture #2. Electric Field Electric Flux Gauss s Law
Physcs 11b Lectue # Electc Feld Electc Flux Gauss s Law What We Dd Last Tme Electc chage = How object esponds to electc foce Comes n postve and negatve flavos Conseved Electc foce Coulomb s Law F Same
More informationIf there are k binding constraints at x then re-label these constraints so that they are the first k constraints.
Mathematcal Foundatons -1- Constaned Optmzaton Constaned Optmzaton Ma{ f ( ) X} whee X {, h ( ), 1,, m} Necessay condtons fo to be a soluton to ths mamzaton poblem Mathematcally, f ag Ma{ f ( ) X}, then
More informationExperimental study on parameter choices in norm-r support vector regression machines with noisy input
Soft Comput 006) 0: 9 3 DOI 0.007/s00500-005-0474-z ORIGINAL PAPER S. Wang J. Zhu F. L. Chung Hu Dewen Expemental study on paamete choces n nom- suppot vecto egesson machnes wth nosy nput Publshed onlne:
More informationChapter 23: Electric Potential
Chapte 23: Electc Potental Electc Potental Enegy It tuns out (won t show ths) that the tostatc foce, qq 1 2 F ˆ = k, s consevatve. 2 Recall, fo any consevatve foce, t s always possble to wte the wok done
More informationBackward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002
Backwad Haplotype Tansmsson Assocaton (BHTA) Algothm A Fast ult-pont Sceenng ethod fo Complex Tats Tan Zheng Depatment of Statstcs Columba Unvesty Febuay 5 th, 2002 Ths s a jont wok wth Pofesso Shaw-Hwa
More informationCS649 Sensor Networks IP Track Lecture 3: Target/Source Localization in Sensor Networks
C649 enso etwoks IP Tack Lectue 3: Taget/ouce Localaton n enso etwoks I-Jeng Wang http://hng.cs.jhu.edu/wsn06/ png 006 C 649 Taget/ouce Localaton n Weless enso etwoks Basc Poblem tatement: Collaboatve
More informationEnergy in Closed Systems
Enegy n Closed Systems Anamta Palt palt.anamta@gmal.com Abstact The wtng ndcates a beakdown of the classcal laws. We consde consevaton of enegy wth a many body system n elaton to the nvese squae law and
More informationOn Maneuvering Target Tracking with Online Observed Colored Glint Noise Parameter Estimation
Wold Academy of Scence, Engneeng and Technology 6 7 On Maneuveng Taget Tacng wth Onlne Obseved Coloed Glnt Nose Paamete Estmaton M. A. Masnad-Sha, and S. A. Banan Abstact In ths pape a compehensve algothm
More informationan application to HRQoL
AlmaMate Studoum Unvesty of Bologna A flexle IRT Model fo health questonnae: an applcaton to HRQoL Seena Boccol Gula Cavn Depatment of Statstcal Scence, Unvesty of Bologna 9 th Intenatonal Confeence on
More informationMachine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1
Machne Leanng -7/5 7/5-78, 78, Spng 8 Spectal Clusteng Ec Xng Lectue 3, pl 4, 8 Readng: Ec Xng Data Clusteng wo dffeent ctea Compactness, e.g., k-means, mxtue models Connectvty, e.g., spectal clusteng
More informationTransport Coefficients For A GaAs Hydro dynamic Model Extracted From Inhomogeneous Monte Carlo Calculations
Tanspot Coeffcents Fo A GaAs Hydo dynamc Model Extacted Fom Inhomogeneous Monte Calo Calculatons MeKe Ieong and Tngwe Tang Depatment of Electcal and Compute Engneeng Unvesty of Massachusetts, Amhest MA
More informationProfessor Wei Zhu. 1. Sampling from the Normal Population
AMS570 Pofesso We Zhu. Samplg fom the Nomal Populato *Example: We wsh to estmate the dstbuto of heghts of adult US male. It s beleved that the heght of adult US male follows a omal dstbuto N(, ) Def. Smple
More informationA Queuing Model for an Automated Workstation Receiving Jobs from an Automated Workstation
Intenatonal Jounal of Opeatons Reseach Intenatonal Jounal of Opeatons Reseach Vol. 7, o. 4, 918 (1 A Queung Model fo an Automated Wokstaton Recevng Jobs fom an Automated Wokstaton Davd S. Km School of
More informationContact, information, consultations
ontact, nfomaton, consultatons hemsty A Bldg; oom 07 phone: 058-347-769 cellula: 664 66 97 E-mal: wojtek_c@pg.gda.pl Offce hous: Fday, 9-0 a.m. A quote of the week (o camel of the week): hee s no expedence
More information24-2: Electric Potential Energy. 24-1: What is physics
D. Iyad SAADEDDIN Chapte 4: Electc Potental Electc potental Enegy and Electc potental Calculatng the E-potental fom E-feld fo dffeent chage dstbutons Calculatng the E-feld fom E-potental Potental of a
More informationAPPLICATIONS OF SEMIGENERALIZED -CLOSED SETS
Intenatonal Jounal of Mathematcal Engneeng Scence ISSN : 22776982 Volume Issue 4 (Apl 202) http://www.mes.com/ https://stes.google.com/ste/mesounal/ APPLICATIONS OF SEMIGENERALIZED CLOSED SETS G.SHANMUGAM,
More informationVibration Input Identification using Dynamic Strain Measurement
Vbaton Input Identfcaton usng Dynamc Stan Measuement Takum ITOFUJI 1 ;TakuyaYOSHIMURA ; 1, Tokyo Metopoltan Unvesty, Japan ABSTRACT Tansfe Path Analyss (TPA) has been conducted n ode to mpove the nose
More informationRigid Bodies: Equivalent Systems of Forces
Engneeng Statcs, ENGR 2301 Chapte 3 Rgd Bodes: Equvalent Sstems of oces Intoducton Teatment of a bod as a sngle patcle s not alwas possble. In geneal, the se of the bod and the specfc ponts of applcaton
More informationLASER ABLATION ICP-MS: DATA REDUCTION
Lee, C-T A Lase Ablaton Data educton 2006 LASE ABLATON CP-MS: DATA EDUCTON Cn-Ty A. Lee 24 Septembe 2006 Analyss and calculaton of concentatons Lase ablaton analyses ae done n tme-esolved mode. A ~30 s
More informationEfficiency of the principal component Liu-type estimator in logistic
Effcency of the pncpal component Lu-type estmato n logstc egesson model Jbo Wu and Yasn Asa 2 School of Mathematcs and Fnance, Chongqng Unvesty of Ats and Scences, Chongqng, Chna 2 Depatment of Mathematcs-Compute
More informationEngineering Mechanics. Force resultants, Torques, Scalar Products, Equivalent Force systems
Engneeng echancs oce esultants, Toques, Scala oducts, Equvalent oce sstems Tata cgaw-hll Companes, 008 Resultant of Two oces foce: acton of one bod on anothe; chaacteed b ts pont of applcaton, magntude,
More informationGoodness-of-fit for composite hypotheses.
Section 11 Goodness-of-fit fo composite hypotheses. Example. Let us conside a Matlab example. Let us geneate 50 obsevations fom N(1, 2): X=nomnd(1,2,50,1); Then, unning a chi-squaed goodness-of-fit test
More informationPhysics Exam 3
Physcs 114 1 Exam 3 The numbe of ponts fo each secton s noted n backets, []. Choose a total of 35 ponts that wll be gaded that s you may dop (not answe) a total of 5 ponts. Clealy mak on the cove of you
More informationPhysics Exam II Chapters 25-29
Physcs 114 1 Exam II Chaptes 5-9 Answe 8 of the followng 9 questons o poblems. Each one s weghted equally. Clealy mak on you blue book whch numbe you do not want gaded. If you ae not sue whch one you do
More informationPHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle
1 PHYS 705: Classcal Mechancs Devaton of Lagange Equatons fom D Alembet s Pncple 2 D Alembet s Pncple Followng a smla agument fo the vtual dsplacement to be consstent wth constants,.e, (no vtual wok fo
More informationThe Substring Search Problem
The Substing Seach Poblem One algoithm which is used in a vaiety of applications is the family of substing seach algoithms. These algoithms allow a use to detemine if, given two chaacte stings, one is
More informationA Study about One-Dimensional Steady State. Heat Transfer in Cylindrical and. Spherical Coordinates
Appled Mathematcal Scences, Vol. 7, 03, no. 5, 67-633 HIKARI Ltd, www.m-hka.com http://dx.do.og/0.988/ams.03.38448 A Study about One-Dmensonal Steady State Heat ansfe n ylndcal and Sphecal oodnates Lesson
More informationRe-Ranking Retrieval Model Based on Two-Level Similarity Relation Matrices
Intenatonal Jounal of Softwae Engneeng and Its Applcatons, pp. 349-360 http://dx.do.og/10.1457/sea.015.9.1.31 Re-Rankng Reteval Model Based on Two-Level Smlaty Relaton Matces Hee-Ju Eun Depatment of Compute
More informationTHE REGRESSION MODEL OF TRANSMISSION LINE ICING BASED ON NEURAL NETWORKS
The 4th Intenatonal Wokshop on Atmosphec Icng of Stuctues, Chongqng, Chna, May 8 - May 3, 20 THE REGRESSION MODEL OF TRANSMISSION LINE ICING BASED ON NEURAL NETWORKS Sun Muxa, Da Dong*, Hao Yanpeng, Huang
More information(8) Gain Stage and Simple Output Stage
EEEB23 Electoncs Analyss & Desgn (8) Gan Stage and Smple Output Stage Leanng Outcome Able to: Analyze an example of a gan stage and output stage of a multstage amplfe. efeence: Neamen, Chapte 11 8.0) ntoducton
More informationLarge scale magnetic field generation by accelerated particles in galactic medium
Lage scale magnetc feld geneaton by acceleated patcles n galactc medum I.N.Toptygn Sant Petesbug State Polytechncal Unvesty, depatment of Theoetcal Physcs, Sant Petesbug, Russa 2.Reason explonatons The
More informationAN EXACT METHOD FOR BERTH ALLOCATION AT RAW MATERIAL DOCKS
AN EXACT METHOD FOR BERTH ALLOCATION AT RAW MATERIAL DOCKS Shaohua L, a, Lxn Tang b, Jyn Lu c a Key Laboatoy of Pocess Industy Automaton, Mnsty of Educaton, Chna b Depatment of Systems Engneeng, Notheasten
More informationCOMPLEMENTARY ENERGY METHOD FOR CURVED COMPOSITE BEAMS
ultscence - XXX. mcocd Intenatonal ultdscplnay Scentfc Confeence Unvesty of skolc Hungay - pl 06 ISBN 978-963-358-3- COPLEENTRY ENERGY ETHOD FOR CURVED COPOSITE BES Ákos József Lengyel István Ecsed ssstant
More informationStellar Astrophysics. dt dr. GM r. The current model for treating convection in stellar interiors is called mixing length theory:
Stella Astophyscs Ovevew of last lectue: We connected the mean molecula weght to the mass factons X, Y and Z: 1 1 1 = X + Y + μ 1 4 n 1 (1 + 1) = X μ 1 1 A n Z (1 + ) + Y + 4 1+ z A Z We ntoduced the pessue
More informationRecursive Least-Squares Estimation in Case of Interval Observation Data
Recusve Least-Squaes Estmaton n Case of Inteval Obsevaton Data H. Kuttee ), and I. Neumann 2) ) Geodetc Insttute, Lebnz Unvesty Hannove, D-3067 Hannove, Gemany, kuttee@gh.un-hannove.de 2) Insttute of Geodesy
More informationOptimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time
Intenatonal Jounal of ompute Applcatons (5 ) Volume 44 No, Apl Optmal System fo Wam Standby omponents n the esence of Standby Swtchng Falues, Two Types of Falues and Geneal Repa Tme Mohamed Salah EL-Shebeny
More informationOptimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis
Optmzaton Meods: Lnea Pogammng- Revsed Smple Meod Module Lectue Notes Revsed Smple Meod, Dualty and Senstvty analyss Intoducton In e pevous class, e smple meod was dscussed whee e smple tableau at each
More informationKEYWORDS: survey sampling; prediction; estimation; imputation; variance estimation; ratios of totals
Usng Pedcton-Oented Softwae fo Suvey Estmaton - Pat II: Ratos of Totals James R. Knaub, J. US Dept. of Enegy, Enegy Infomaton dmnstaton, EI-53.1 STRCT: Ths atcle s an extenson of Knaub (1999), Usng Pedcton-Oented
More informationTheo K. Dijkstra. Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE Groningen THE NETHERLANDS
RESEARCH ESSAY COSISE PARIAL LEAS SQUARES PAH MODELIG heo K. Djksta Faculty of Economcs and Busness, Unvesty of Gonngen, ettelbosje, 9747 AE Gonngen HE EHERLADS {t.k.djksta@ug.nl} Jög Hensele Faculty of
More informationLinks in edge-colored graphs
Lnks n edge-coloed gaphs J.M. Becu, M. Dah, Y. Manoussaks, G. Mendy LRI, Bât. 490, Unvesté Pas-Sud 11, 91405 Osay Cedex, Fance Astact A gaph s k-lnked (k-edge-lnked), k 1, f fo each k pas of vetces x 1,
More informationiclicker Quiz a) True b) False Theoretical physics: the eternal quest for a missing minus sign and/or a factor of two. Which will be an issue today?
Clce Quz I egsteed my quz tansmtte va the couse webste (not on the clce.com webste. I ealze that untl I do so, my quz scoes wll not be ecoded. a Tue b False Theoetcal hyscs: the etenal quest fo a mssng
More informationGENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS
#A39 INTEGERS 9 (009), 497-513 GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS Mohaad Faokh D. G. Depatent of Matheatcs, Fedows Unvesty of Mashhad, Mashhad,
More informationRemember: When an object falls due to gravity its potential energy decreases.
Chapte 5: lectc Potental As mentoned seveal tmes dung the uate Newton s law o gavty and Coulomb s law ae dentcal n the mathematcal om. So, most thngs that ae tue o gavty ae also tue o electostatcs! Hee
More informationError-rate Estimation with Ones Counting
UC TCHNICA RPORT CNG-8- ng Hseh epatment of lectcal ngneeng teb chool of ngneeng Unvesty of ouen Calfona o-ate stmaton w Ones Countng Zhaolang Pan elvn A Beue Abstact W I ccut featue sze scalng down, t
More informationON THE FRESNEL SINE INTEGRAL AND THE CONVOLUTION
IJMMS 3:37, 37 333 PII. S16117131151 http://jmms.hndaw.com Hndaw Publshng Cop. ON THE FRESNEL SINE INTEGRAL AND THE CONVOLUTION ADEM KILIÇMAN Receved 19 Novembe and n evsed fom 7 Mach 3 The Fesnel sne
More informationSome Approximate Analytical Steady-State Solutions for Cylindrical Fin
Some Appoxmate Analytcal Steady-State Solutons fo Cylndcal Fn ANITA BRUVERE ANDRIS BUIIS Insttute of Mathematcs and Compute Scence Unvesty of Latva Rana ulv 9 Rga LV459 LATVIA Astact: - In ths pape we
More informationStudy on Vibration Response Reduction of Bladed Disk by Use of Asymmetric Vane Spacing (Study on Response Reduction of Mistuned Bladed Disk)
Intenatonal Jounal of Gas ubne, Populson and Powe Systems Febuay 0, Volume 4, Numbe Study on Vbaton Response Reducton of Bladed Dsk by Use of Asymmetc Vane Spacng (Study on Response Reducton of Mstuned
More informationPart V: Velocity and Acceleration Analysis of Mechanisms
Pat V: Velocty an Acceleaton Analyss of Mechansms Ths secton wll evew the most common an cuently pactce methos fo completng the knematcs analyss of mechansms; escbng moton though velocty an acceleaton.
More informationConstraint Score: A New Filter Method for Feature Selection with Pairwise Constraints
onstant Scoe: A New Flte ethod fo Featue Selecton wth Pawse onstants Daoqang Zhang, Songcan hen and Zh-Hua Zhou Depatment of ompute Scence and Engneeng Nanjng Unvesty of Aeonautcs and Astonautcs, Nanjng
More informationPhysics 202, Lecture 2. Announcements
Physcs 202, Lectue 2 Today s Topcs Announcements Electc Felds Moe on the Electc Foce (Coulomb s Law The Electc Feld Moton of Chaged Patcles n an Electc Feld Announcements Homewok Assgnment #1: WebAssgn
More informationImproving the efficiency of the ratio/product estimators of the population mean in stratified random samples
STATISTICS RESEARCH ARTICLE Impovng the effcency of the ato/poduct estmatos of the populaton mean statfed andom samples Receved: 10 Decembe 2017 Accepted: 08 July 2018 Fst Publshed 16 July 2018 *Coespondng
More informationA NOTE ON ELASTICITY ESTIMATION OF CENSORED DEMAND
Octobe 003 B 003-09 A NOT ON ASTICITY STIATION OF CNSOD DAND Dansheng Dong an Hay. Kase Conell nvesty Depatment of Apple conomcs an anagement College of Agcultue an fe Scences Conell nvesty Ithaca New
More informationGENERALIZED MULTIVARIATE EXPONENTIAL TYPE (GMET) ESTIMATOR USING MULTI-AUXILIARY INFORMATION UNDER TWO-PHASE SAMPLING
Pak. J. Statst. 08 Vol. (), 9-6 GENERALIZED MULTIVARIATE EXPONENTIAL TYPE (GMET) ESTIMATOR USING MULTI-AUXILIARY INFORMATION UNDER TWO-PHASE SAMPLING Ayesha Ayaz, Zahoo Ahmad, Aam Sanaullah and Muhammad
More informationUNIVERSITY OF TRENTO FREQUENCY DOMAIN TESTING OF WAVEFORM DIGITIZERS. Dario Petri. March Technical Report # DIT
UIVERSITY OF TRETO DEPARTMET OF IFORMATIO AD COMMUICATIO TECHOLOGY 385 Povo Tento (Italy), Va Sommave 4 http://www.dt.untn.t FREQUECY DOMAI TESTIG OF WAVEFORM DIGITIZERS Dao Pet Mach 4 Techncal Repot #
More informationAmplifier Constant Gain and Noise
Amplfe Constant Gan and ose by Manfed Thumm and Wene Wesbeck Foschungszentum Kalsuhe n de Helmholtz - Gemenschaft Unvestät Kalsuhe (TH) Reseach Unvesty founded 85 Ccles of Constant Gan (I) If s taken to
More informationESTIMATING A TAIL EXPONENT BY MODELLING DEPARTURE FROM A PARETO DISTRIBUTION
The Annals of Statstcs 999, Vol. 7, No., 760 78 ESTIMATING A TAIL EXPONENT BY MODELLING DEPARTURE FROM A PARETO DISTRIBUTION BY ANDREY FEUERVERGER AND PETER HALL Unvesty of Toonto and Unvesty of Toonto
More informationMinimal Detectable Biases of GPS observations for a weighted ionosphere
LETTER Eath Planets Space, 52, 857 862, 2000 Mnmal Detectable Bases of GPS obsevatons fo a weghted onosphee K. de Jong and P. J. G. Teunssen Depatment of Mathematcal Geodesy and Postonng, Delft Unvesty
More informationA Tutorial on Low Density Parity-Check Codes
A Tutoal on Low Densty Paty-Check Codes Tuan Ta The Unvesty of Texas at Austn Abstact Low densty paty-check codes ae one of the hottest topcs n codng theoy nowadays. Equpped wth vey fast encodng and decodng
More informationMachine Learning 4771
Machne Leanng 4771 Instucto: Tony Jebaa Topc 6 Revew: Suppot Vecto Machnes Pmal & Dual Soluton Non-sepaable SVMs Kenels SVM Demo Revew: SVM Suppot vecto machnes ae (n the smplest case) lnea classfes that
More informationPHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite
PHYS 015 -- Week 5 Readng Jounals today fom tables WebAssgn due Wed nte Fo exclusve use n PHYS 015. Not fo e-dstbuton. Some mateals Copyght Unvesty of Coloado, Cengage,, Peason J. Maps. Fundamental Tools
More informationPattern Analyses (EOF Analysis) Introduction Definition of EOFs Estimation of EOFs Inference Rotated EOFs
Patten Analyses (EOF Analyss) Intoducton Defnton of EOFs Estmaton of EOFs Infeence Rotated EOFs . Patten Analyses Intoducton: What s t about? Patten analyses ae technques used to dentfy pattens of the
More informationA STUDY OF SOME METHODS FOR FINDING SMALL ZEROS OF POLYNOMIAL CONGRUENCES APPLIED TO RSA
Jounal of Mathematcal Scences: Advances and Applcatons Volume 38, 06, Pages -48 Avalable at http://scentfcadvances.co.n DOI: http://dx.do.og/0.864/jmsaa_700630 A STUDY OF SOME METHODS FOR FINDING SMALL
More informationDepartment of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution
Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable
More informationMultipole Radiation. March 17, 2014
Multpole Radaton Mach 7, 04 Zones We wll see that the poblem of hamonc adaton dvdes nto thee appoxmate egons, dependng on the elatve magntudes of the dstance of the obsevaton pont,, and the wavelength,
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationSURVEY OF APPROXIMATION ALGORITHMS FOR SET COVER PROBLEM. Himanshu Shekhar Dutta. Thesis Prepared for the Degree of MASTER OF SCIENCE
SURVEY OF APPROXIMATION ALGORITHMS FOR SET COVER PROBLEM Hmanshu Shekha Dutta Thess Pepaed fo the Degee of MASTER OF SCIENCE UNIVERSITY OF NORTH TEXAS Decembe 2009 APPROVED: Fahad Shahokh, Mao Pofesso
More informationOnline Appendix to Position Auctions with Budget-Constraints: Implications for Advertisers and Publishers
Onlne Appendx to Poston Auctons wth Budget-Constants: Implcatons fo Advetses and Publshes Lst of Contents A. Poofs of Lemmas and Popostons B. Suppotng Poofs n the Equlbum Devaton B.1. Equlbum wth Low Resevaton
More informationThe M 2 -tree: Processing Complex Multi-Feature Queries with Just One Index
The M -tee: Pocessng Complex Mult-Featue Quees wth Just ne Index Paolo Cacca, Maco Patella DEIS - CSITE-CNR, Unvesty of Bologna, Italy fpcacca,mpatellag@des.unbo.t Abstact Motvated by the needs fo effcent
More informationState Estimation. Ali Abur Northeastern University, USA. Nov. 01, 2017 Fall 2017 CURENT Course Lecture Notes
State Estmaton Al Abu Notheasten Unvesty, USA Nov. 0, 07 Fall 07 CURENT Couse Lectue Notes Opeatng States of a Powe System Al Abu NORMAL STATE SECURE o INSECURE RESTORATIVE STATE EMERGENCY STATE PARTIAL
More informationMonte Carlo comparison of back-propagation, conjugate-gradient, and finite-difference training algorithms for multilayer perceptrons
Rocheste Insttute of Technology RIT Schola Woks Theses Thess/Dssetaton Collectons 20 Monte Calo compason of back-popagaton, conugate-gadent, and fnte-dffeence tanng algothms fo multlaye peceptons Stephen
More information4 SingularValue Decomposition (SVD)
/6/00 Z:\ jeh\self\boo Kannan\Jan-5-00\4 SVD 4 SngulaValue Decomposton (SVD) Chapte 4 Pat SVD he sngula value decomposton of a matx s the factozaton of nto the poduct of thee matces = UDV whee the columns
More information