Optimal Workload-based Weighted Wavelet Synopses

Size: px
Start display at page:

Download "Optimal Workload-based Weighted Wavelet Synopses"

Transcription

1 Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel Abstract In recent years wavelets were shown to be effectve data synopses. We are concerned wth the problem of fndng effcently wavelet synopses for massve data sets, n stuatons where nformaton about query workload s avalable. We present lnear tme, I/O optmal algorthms for buldng optmal workload-based wavelet synopses for pont queres. The synopses are based on a novel constructon of weghted nner-products and use weghted wavelets that are adapted to those products. The synopses are optmal n the sense that the subset of retaned coeffcents s the best possble for the bases n use wth respect to ether the mean-squared absolute or relatve errors. For the latter, ths s the frst optmal wavelet synopss even for the regular, non-workload-based case. Expermental results demonstrate the advantage obtaned by the new optmal wavelet synopses, as well as the robustness of the synopses to devatons n the actual query workload. 1 Introducton In recent years there has been ncreasng attenton to the development and study of data synopses, as effectve means for addressng performance ssues n massve data sets. Data synopses are concse representatons of data sets, that are meant to effectvely support approxmate queres to the represented data sets [10]. A prmary constrant of a data synopss s ts sze. The effectveness of a data synopss s measured by the accuracy of the answers t provdes, as well as by ts response tme and ts constructon tme. Several dfferent synopses were ntroduced and studed, ncludng random samples, sketches, and dfferent types of hstograms. Recently, wavelet-based synopses were ntroduced and shown to be a powerful tool for buldng effectve data synopses for varous applcatons, ncludng selectvty estmaton for query optmzaton n DBMS, approxmate query processng n OLAP applcatons and more (see [16, 20, 21, 2, 6, 9, 8], and references theren). The general dea of wavelet-based approxmatons s to transform a gven data vector of sze nto a representaton wth respect to a wavelet bass (ths s called a wavelet transform), and approxmate t usng only M wavelet bass vectors, by retanng only M coeffcents from the lnear combnaton that spans the data vector (coeffcents thresholdng). The lnear combnaton Research partly supported by a grant from the Israel Scence Foundaton. Contact author 1

2 that uses only M coeffcents (and assumes that all other coeffcents are zero) defnes a new vector that approxmates the orgnal vector, usng less space. Ths s called M-term approxmaton, whch defnes a wavelet synopss of sze M. Wavelet synopses. Wavelets were tradtonally used to compress some data set where the purpose s to reconstruct, n a later tme, an approxmaton of the whole data usng the set of retaned coeffcents. The stuaton s a lttle dfferent when usng wavelets for buldng synopses n database systems [16]: n ths case only portons of the data are reconstructed each tme, n response to user queres, rather than the whole data at once. As a result, portons of the data that are used for answerng frequent queres are reconstructed more frequently than portons of the data that correspond to rare queres. Therefore, the approxmaton error s measured over the mult-set of actual queres, rather than over the data tself. Another aspect of the use of wavelets n database systems s that due to the large data-szes n databases (gga-, tera- and peta-bytes), the effcency of buldng wavelet synopses s of prmary mportance. Dsk I/Os should be mnmzed as much as possble, and non-lnear-tme algorthms may be unacceptable. Optmal wavelet synopses. The man advantage of transformng the data nto a representaton wth respect to a wavelet bass s that for data vectors contanng smlar values, many wavelet coeffcents tend to have very small values. Thus, elmnatng such small coeffcents ntroduces only small errors when reconstructng the orgnal data, resultng n a very effectve form of lossy data compresson. Generally speakng, we can characterze a wavelet approxmaton by three attrbutes: how the approxmaton error s measured, what wavelet bass s used and how coeffcent thresholdng s done. Many bases were suggested and used n tradtonal wavelets lterature. Gven a bass wth respect to whch the transform s done, the selecton of coeffcents that are retaned n the wavelet synopss may have sgnfcant mpact on the approxmaton error. The goal s therefore to select a subset of M coeffcents that mnmzes some approxmaton-error measure. Ths subset s called an optmal wavelet synopss, wth respect to the chosen error measure. Whle there has been a consderable work on wavelet synopses and ther applcatons [16, 20, 21, 2, 6, 9, 8], so far there were only a few optmalty results. The frst one s a lnear-tme Parsevalbased algorthm, whch was used n tradtonal wavelets lterature (e.g [12]), where the error was measured over the data. Ths algorthm mnmzes the L 2 norm of the error vector, and equvalently t mnmzes the mean-squared-absolute error over all possble pont queres. o algorthm that mnmzes the mean-squared-relatve error over all possble pont queres was known. The second one, ntroduced recently [9], s a polynomal-tme (O( 2 M log M)) algorthm that mnmzes the max relatve or absolute error over all possble pont queres. Another optmalty result s a polynomal tme dynamc-programmng algorthm that obtans an optmal wavelet synopss over multple measures [6]. The synopss s optmal w.r.t. an error metrc defned as weghted combnaton of L 2 norms over the multple measures (ths weghted combnaton has no relaton wth the noton of weghted wavelets of ths paper). Workload-based wavelet synopses. In recent years there s ncreased nterest n workloadbased synopses synopses that are adapted to a gven query workload, wth the assumpton that the workload represents (approxmately) a probablty dstrbuton from whch future queres wll be taken. Chaudhur et al [4] argue that dentfyng an approprate precomputed sample that avods large errors on an arbtrary query s vrtually mpossble. To mnmze the effects of ths problem, 2

3 prevous studes have proposed usng the workload to gude the process of selectng samples [1, 3, 7]. By pckng a sample that s tuned to the gven workload, we can reduce the error over frequent (or otherwse mportant ) queres n the workload. In [4], the authors formulate the problem of pre-computng a sample as an optmzaton problem, whose goal s to pck a sample that mnmzes the error for the gven workload. Recently, workload-based wavelet synopses were proposed [14, 18]. Usng an adaptve-greedy algorthm, the query-workload nformaton was used durng the thresholdng process n order to buld a wavelet synopss that decreases the error w.r.t. to the query workload. Whle these workloadbased wavelet synopses demonstrate sgnfcant mporvement wth respect to pror synopses, they are not optmal. In ths paper, we address the problem of fndng effcently optmal workload-based wavelet synopses. 1.1 Contrbutons We ntroduce effcent algorthms for fndng optmal workload-based wavelet synopses usng weghted Haar (WH) wavelets, for workloads of pont queres. Our man contrbutons are: Lnear-tme, I/O optmal algorthms that fnd optmal Workload-based Weghted Wavelet (WWW) synopses: 1 An optmal synopss w.r.t. workload-based mean-squared absolute-error (WB-MSE). An optmal synopss w.r.t. workload-based mean-squared relatve-error (WB-MRE). Equvalently, the algorthms mnmze the expected squared, absolute or relatve errors over a pont query taken from a gven dstrbuton. The WB-MRE algorthm, used wth unform workload, s also the frst algorthm that mnmzes the mean-squared-relatve-error over the data values, wth respect to a wavelet bass. Both WWW synopses are also optmal wth respect to enhanced wavelet synopses, whch allow changng the values of the synopses coeffcents to arbtrary values. Expermental results show the advantage of our synopses wth respect to exstng synopses. The synopses are robust to devaton from the pre-defned workload, as demonstrated by our experments. The above results were obtaned usng the followng novel technques. We defne the problem of fndng optmal workload-based wavelet synopses n terms of a weghted norm, a weghted-nner-product and a weghted-nner-product-space. Ths enables lnear tme I/O optmal algorthms for buldng optmal workload-based wavelet synopses. The approach of usng a weghted nner product can also be used to the general case n whch each data pont s gven dfferent prorty, representng ts sgnfcance (an example s shown n Sec. 6). Usng these weghts, one can fnd a weghted-wavelet bass, and an optmal weghted wavelet synopss n lnear tme, wth O(/B) I/Os. 1 o relaton whatsover to the world-wde-web. 3

4 We ntroduce the use of weghted wavelets for data synopses. Usng weghted wavelets [5, 11] enables fndng optmal workload-based wavelet synopses effcently. In contrast, t s not known how to obtan optmal workload-based wavelet synopses wth respect to the Haar bass effcently. If we gnore the effcency of fndng a synopss, the Haar bass s as good as the weghted Haar bass for approxmaton. In wavelets lterature (e.g [12]), wavelets are used to approxmate a gven sgnal, whch s treated as a vector n an nner-product space. Snce an nner-product defnes an L 2 norm, the approxmaton error s measured as the L 2 norm of the error vector, whch s the dfference between the approxmated vector and the approxmatng vector. Many wavelet bases were used for approxmaton, as dfferent bases are adequate for approxmatng dfferent collectons of data vectors. By usng an orthonormal wavelet bass, an optmal coeffcent thresholdng can be acheved n lnear tme, based on Parseval s formula. When usng non-orthogonal wavelet bass, or measurng the error usng other norms (e.g L ), t s not known whether an optmal coeffcent thresholdng can be found effcently, so usually non-optmal greedy algorthms are used n practce. A WH bass s a generalzaton of the standard Haar bass, whch s typcally used for wavelet synopses due to ts smplcty. There are several attrbutes by whch a wavelet bass s characterzed, whch affects the qualty of the approxmatons acheved usng ths bass (for full dscusson, see [12]). These attrbute are: the set of nested spaces of ncreasng resoluton whch the bass spans, the number of vanshng moments of the bass, and ts compact support (f exsts). Both Haar bass and a WH bass span the same subsets of nested spaces, have one vanshng moment, and a compact support of sze 1. Haar bass s orthonormal for unform workload of pont queres. Hence t s optmal for the M SE error measure. The WH bass s orthonormal wth respect to the weghted nner-product defned by the problem of fndng optmal workload-based wavelet synopses. As a result, an optmal workload-based synopses wth respect to WH bass s acheved effcently, based on Parseval s formula, whle for the Haar bass no effcent optmal thresholdng algorthm s known, n cases other than unform workload. 1.2 Paper outlne The rest of the paper s structured as follows. In Sec. 2 we descrbe the bascs of wavelet-based synopses. In Sec. 3 we descrbe the basc deas we rely on n our development, ncludng the workload-based error metrcs and optmal thresholdng n orthonormal bases. In Sec. 4 we defne the problem of fndng optmal workload-based wavelet synopses n terms of weghted nner product, and solve t usng an orthonormal bass. In Sec. 5 we descrbe the optmal algorthm for mnmzng WB-MSE, whch s based on the constructon of Sec. 4. In Sec. 6 we extend the algorthm to work for the WB-MRE. In Sec. 7 we present expermental results, and n Sec. 8 we draw our conclusons. 2 Wavelets bascs In ths secton we wll start by presentng the Haar wavelets and contnue wth presentng wavelet based synopses, obtaned by thresholdng process, descrbed n Sec The error tree structure wll be presented next (Sec. 2.3), along wth the descrpton of the reconstructon of orgnal data from the wavelet synopses n Sec Wavelets are a mathematcal tool for the herarchcal decomposton of functons n a spaceeffcent manner. Wavelets represent a functon n terms of a coarse overall shape, plus detals that 4

5 range from coarse to fne. Regardless of whether the functon of nterest s an mage, a curve, or a surface, wavelets offer an elegant technque for representng the varous levels of detal of the functon n a space-effcent manner. 2.1 One-dmensonal Haar wavelets Haar wavelets are conceptually the smplest wavelet bass functons, and were thus used n prevous works of wavelet synopses. They are fastest to compute and easest to mplement. We focus on them for purpose of exposton n ths paper. To llustrate how Haar wavelets work, we wll start wth a smple example borrowed from [16]. Suppose we have one-dmensonal sgnal of = 8 data tems: S = [2, 2, 0, 2, 3, 5, 4, 4]. We wll show how the Haar wavelet transform s done over S. We frst average the sgnal values, parwse, to get a new lower-resoluton sgnal wth values [2, 1, 4, 4]. That s, the frst two values n the orgnal sgnal (2 and 2) average to 2, and the second two values 0 and 2 average to 1, and so on. We also store the parwse dfferences of the orgnal values (dvded by 2) as detal coeffcents. In the above example, the four detal coeffcents are (2 2)/2 = 0, (0 2)/2 = 1, (3 5)/2 = 1, and (4 4)/2 = 0. It s easy to see that the orgnal values can be recovered from the averages and dfferences. Ths was one phase of the Haar wavelet transform. By repeatng ths process recursvely on the averages, we get the Haar wavelet transform (Table 1). We defne the wavelet transform (also called wavelet decomposton) of the orgnal eghth-value sgnal to be the sngle coeffcent representng the overall average of the orgnal sgnal, followed by the detal coeffcents n the order of ncreasng resoluton. Thus, for the one-dmensonal Haar bass, the wavelet transform of our sgnal s gven by S = [2 3 4, 1 1 4, 1 2, 0, 0, 1, 1, 0] Resoluton Averages Detal Coeffcents 8 [2, 2, 0, 2, 3, 5, 4, 4] 4 [2, 1, 4, 4] [0,-1,-1, 0] 2 [1.5, 4] [0.5, 0] 1 [2.75] Table 1: Haar Wavelet Decomposton The ndvdual entres are called the wavelet coeffcents. The wavelet decomposton s very effcent computatonally, requrng only O() CPU tme and O(/B) I/Os to compute for a sgnal of values, where B s the dsk-block sze. o nformaton has been ganed or lost by ths process. The orgnal sgnal has eght values, and so does the transform. Gven the transform, we can reconstruct the exact sgnal by recursvely addng and subtractng the detal coeffcents from the next-lower resoluton. In fact we have transformed the sgnal S nto a representaton wth respect to another bass of R 8 : The Haar wavelet bass. A detaled dscusson can be found, for example, n [19]. 2.2 Thresholdng Gven a lmted amount of storage for mantanng a wavelet synopss of a data array A (or equvalently a vector S), we can only retan a certan number M of the coeffcents stored n 5

6 the wavelet decomposton of A. The remanng coeffcents are mplctly set to 0. The goal of coeffcent thresholdng s to determne the best subset of M coeffcents to retan, so that some overall error measure n the approxmaton s mnmzed. One advantage of the wavelet transform s that n many cases a large number of the detal coeffcents turn out to be very small n magntude. Truncatng these small coeffcents from the representaton (.e., replacng each one by 0) ntroduces only small errors n the reconstructed sgnal. We can approxmate the orgnal sgnal effectvely by keepng only the most sgnfcant coeffcents. For a gven nput sequence d 0,..., d 1, we can measure the error of approxmaton n several ways. Let the th data value be d. Let q be the th pont query, whch t s value s d. Let ˆd be the estmated result of d. We use the followng error measure for the absolute error over the th data value: e = e(q ) = d ˆd Once we have the error measure for representng the errors of ndvdual data values, we would lke to measure the norm of the vector of errors e = (e 0,..., e 1 ). The standard way s to use the L 2 norm of e dvded by whch s called the mean squared error: MSE(e) = e = 1 1 e 2 We would use the terms MSE and L 2 norm nterchangeably durng our development snce they are completely equvalent, to a postve multplcatve constant. The basc thresholdng algorthm, based on Parseval s formula, s as follows: let α 0,..., α 1 be the wavelet coeffcents, and for each α let level(α ) be the resoluton level of α. The detal coeffcents are normalzed by dvdng each coeffcent by 2 level(a ) reflectng the fact that coeffcents at the lower resolutons are less mportant than the coeffcents at the hgher resolutons. Ths process actually turns the wavelet coeffcents nto an orthonormal bass coeffcents (and s thus called normalzaton ). The M largest normalzed coeffcents are retaned. The remanng M coeffcents are mplctly replaced by zero. Ths determnstc process provably mnmzes the L 2 norm of the vector of errors defned above, based on Parseval s formula (see Sec. 3). 2.3 Error tree The wavelet decomposton procedure followed by any thresholdng can be represented by an error tree [16]. Fg. 1 presents the error tree for the above example. Each nternal node of the error tree s assocated wth a wavelet coeffcent, and each leaf s assocated wth an orgnal sgnal value. Internal nodes and leaves are labelled separately by 0, 1,..., 1. For example, the root s an nternal node wth label 0 and ts node value s 2.75 n Fg. 1. For convenence, we shall use node and node value nterchangeably. The constructon of the error tree exactly mrrors the wavelet transform procedure. It s a bottom-up process. Frst, leaves are assgned orgnal sgnal values from left to rght. Then wavelet coeffcents are computed, level by level, and assgned to nternal nodes. 2.4 Reconstructon of orgnal data Gven an error tree T and an nternal node t of T, t a 0, we let leftleaves(t) (rghtleaves(t)) denote the set of leaves (.e., data) nodes n the subtree rooted at t s left (resp., rght) chld. Also, gven any (nternal or leaf) node u, we let path(u) be the set of all (nternal) nodes n T that are 6

7 Fgure 1: Error tree for = 8 proper ancestors of u (.e., the nodes on the path from u to the root of T, ncludng the root but not u) wth nonzero coeffcents. Fnally, for any two leaf nodes d l and d k we denote d(l : h) as the range sum k =l d Usng the error tree representaton T, we can outlne the followng reconstructon propertes of the Haar wavelet decomposton [16]: Sngle value. The reconstructon of any data value d depends only on the values of the nodes n path(d ). d = α j path(d ) δ j α j where δ j = +1 f d leftleaves(α j ) or j = 0, and δ j = 1 otherwse Range sum. An nternal node α j contrbutes to the range sum d(l : h) only f α j path(d l ) path(d k ). d(l : h) = α j path(d l ) path(d h ) x j where { (h l) αj f j = 0 x j = ( leftleaves(α j, l : h) rghtleaves(α j, l : h) ) α j otherwse 7

8 and where leftleaves(α j, l : h) = leftleaves(α j ) {d l, d l+1,..., d h } (.e., the ntersecton of leftleaves(α j ) wth the summaton range) and rghtleaves(α j, l : h) s defned smlarly. Thus, a reconstructon of a sngle data values nvolves the summaton of at most log + 1 coeffcents, and reconstructng a range sum nvolves the summaton of at most 2 log + 1 coeffcents, regardless of the wdth of the range. 3 The bascs of our development 3.1 Workload-based error metrcs Let D = (d 0,..., d 1 ) be a sequence wth = 2 j values. Denote the set of pont queres as Q = (q 0,..., q 1 ), where q s a query whch ts answer s d. Let a workload W = (c 0,..., c 1 ) be a vector of weghts that represents the probablty dstrbuton from whch future pont queres are to be generated. Let (u 0,..., u 1 ) be a bass of R, than D = α u. We can represent D by a vector of coeffcents (α 0,..., α 1 ). Suppose we want to approxmate D usng a subset of the coeffcents S {α 0,..., α 1 } where S = M. Then, for any subset S we can defne a weghted norm W L 2 wth respect to S, that provdes a measure for the errors expected for queres drawn from the probablty dstrbuton represented by W, when usng S as a synopss. S s then referred to as a workload-based wavelet synopss. Denote ˆd as an approxmaton of d usng S. There are two standard ways to measure the error over the th data value (equvalently, pont query): The absolute error: e a () = e a (q ) = d ˆd ; and the relatve error: e r () = e r (q ) = d ˆd max{ d,s}, where s s a postve bound that prevents small values from domnatng the relatve error. Whle the general (non-workload-based) approach s to reduce the L 2 norm of the vector of errors (e 1,..., e ) (where e = e a () or e = e r ()), here we would generalze the L 2 norm to reflect the query workload. Gven a workload W that conssts of all the queres probabltes c 1,..., c (where c s the probablty that q appears), the weghted-l 2 norm of the vector of (absolute or relatve) errors e = (e 1,..., e ) would be: W L 2 (e) = e w = 1 c e 2 where 0 < c 1, 1 c = 1. The ntuton behnd ths defnton of norm s to gve each data value d (or equvalently each pont query q ) some weght that represents ts sgnfcance. In the above case the square of the W L 2 norm s the expected squared error for a pont query that s drawn from the gven dstrbuton. In other words, to mnmze that norm of the error s to mnmze the expected squared error of an answer to a query. In general, the weghts gven to data values need not necessarly represent a probablty dstrbuton of pont queres, but any other sgnfcance measure. For example, n Sec. 6 we use weghts to solve the problem of mnmzng the mean-squared relatve error measured over the data values (the non-workload-based case). otce that t s a generalzaton of the MSE norm: by takng equal weghts for each query, meanng c = 1 for each and e = e a (), we get the standard MSE norm. We use the term workload-based error for the W L 2 norm of the vector of errors e. When e are absolute (resp. relatve) errors the workload-based error would be called the WB-MSE (resp. WB-MRE). 8

9 3.2 Optmal thresholdng n orthonormal bases The constructon s based on Parseval s formula, and a known theorem that results from t (Thm. 1) Parseval s formula. Let V be a vector space, where v V s a vector and {u 0,..., u 1 } s an orthonormal bass of V. We can express v as v = 1 α u. Then v 2 = 1 α 2 (1) An M-term approxmaton s acheved by representng v usng a subset of coeffcents S {α 0,..., α 1 } where S = M. The error vector s than e = / S α u. By Parseval s formula, e 2 = / S α2. Ths proves the followng theorem. Theorem 1 (Parseval-based optmal thresholdng) Let V be a vector space, where v V s a vector and {u 0,..., u 1 } s an orthonormal bass of V. We can represent v by {α 0,..., α 1 } where v = 1 α u. Suppose we want to approxmate v usng a subset S {α 0,..., α 1 } where S = M. Pckng the M largest coeffcents to S mnmzes the L 2 norm of the error vector, over all possble subsets of M coeffcents. Gven an nner-product, based on ths theorem one can easly fnd an optmal synopses by choosng the largest M coeffcents. 3.3 Optmalty over enhanced wavelet synopses otce that n the prevous secton we lmted ourselves to pckng subsets of coeffcents wth orgnal values from the lnear combnaton that spans v (as s usually done). In case {u 0,..., u 1 } s a wavelet bass, these are the coeffcents that results from the wavelet transform. We next show that the optmal thresholdng accordng to Thm. 1 s optmal even accordng to an enhanced defnton of M-term approxmaton. We defne enhanced wavelet synopses as wavelet synopses that allow arbtrary values to the retaned wavelet coeffcents, rather than the orgnal values that resulted from the transform. The set of possble standard synopses s a subset of the set of possble enhanced synopses, and therefore an optmal synopss accordng to the standard defnton s not necessarly optmal accordng to the enhanced defnton. Theorem 2 When usng an orthonormal bass, choosng the largest M coeffcents wth orgnal values s an optmal enhanced wavelet synopses. Proof : The proof s based on the fact that the bass s orthonormal. It s enough to show that gven some synopss of M coeffcents wth orgnal values, any change to the values of some subset of coeffcents n the synopss would only make the approxmaton error larger: Let u 1,..., u be an orthonormal bass and let v = α 1 u α u be the vector we would lke to approxmate by keepng only M wavelet coeffcents. Wthout loss of generalty, suppose we choose the frst M coeffcents and have the followng approxmaton for v: ṽ = M =1 α u. Accordng to Parseval s formula e 2 = =M+1 α 2 snce the bass s orthonormal. ow suppose we would change the values of some subset of j retaned coeffcents to new values. Let us see that due to the orthonormalty of the bass t would only make the error larger. Wthout loss of generalty we 9

10 would change the frst j coeffcents, meanng, we would change α 1,..., α j to be α 1,..., α j. In ths case the approxmaton would be ṽ = j =1 α u + M =j+1 α u. The approxmaton error would be v ṽ = j =1 (α α ) u + =M+1 α u. It s easy to see that the error of approxmaton would be: e 2 = v ṽ, v ṽ = j =1 (α α )2 + =M+1 α 2 > =M+1 α 2. 4 The workload-based nner product In ths secton, we defne the problem of fndng an optmal workload-based synopses n terms of a weghted-nner-product space, and solve t relyng on ths constructon. Here we deal wth the case where e are the absolute errors (the algorthm mnmzes the WB-MSE). An extenson to relatve errors (WB-MRE) s ntroduced n Sec. 6 Our development s as follows: 1. Transformng the data vector D nto an equvalent representaton as a functon f n a space of pecewse constant functons over [0, 1). (Sec. 4.1) 2. Defnng the workload-based nner product. (Sec. 4.2) 3. Usng the nner product to defne an L 2 norm, showng that the newly defned norm s equvalent to the weghted L 2 norm (W L 2 ). (Sec. 4.3) 4. Defnng a weghted Haar bass whch s orthonormal wth respect to the new nner product. (Sec. 4.4) Based on Thm. 1 and Thm. 2 one can easly fnd an optmal workload-based wavelet synopses wth respect to a weghted Haar wavelet bass. 4.1 Transformng the data vector nto a pecewse constant functon We assume that our approxmated data vector D s of sze = 2 j. As n [19], we treat sequences (vectors) of 2 j ponts as pecewse constant functons defned on the half-open nterval [0, 1). In order to do so, we wll use the concept of a vector space from lnear algebra. A sequence of one pont s just a functon that s constant over the entre nterval [0, 1); we ll let V 0 be the space of all these functons. A sequence of 2 ponts s a functon that has two constant parts over the ntervals [0, 1 2 ) and [ 1 2, 1). We ll call the space contanng all these functons V 1. If we contnue n ths manner, the space V j wll nclude all pecewse constant functons on the nterval [0, 1), wth the nterval dvded equally nto 2 j dfferent sub-ntervals. We can now thnk of every one-dmensonal sequence D of 2 j values as beng an element, or vector f, n V j. 4.2 Defnng a workload-based nner product The frst step s to choose an nner product defned on the vector space V j. Snce we want to mnmze a workload based error (and not the regular L 2 error), we started by defnng a new workload based nner product. The new nner product s a generalzaton of the standard nner product. It s a sum of = 2 j weghted standard products; each of them s defned over an nterval of sze 1 : f, g = ( 1 c f (x) g (x) dx ) where 0 < c 1, 1 c = 1 (2) 10

11 Lemma 1 f, g s an nner product. Proof : Let us check that t satsfes the condtons of an nner product: f, g : V j V j R Symmetrc: Blnear: 1 f, g = c 1 f(x)g(x)dx = c g(x)f(x)dx = g, f 1 af 1 + bf 2, g = c (af 1 + bf 2 )(x)g(x)dx = 1 c 1 a c 1 af 1 (x)g(x)dx + 1 f 1 (x)g(x)dx + b c c bf 2 (x)g(x)dx = f 2 (x)g(x)dx = a f 1, g + b f 2, g and also wth a smlar proof. f, ag 1 + bg 2 = a f, g 1 + b f, g 2 postve defnte: 1 f, f = c 1 f(x)f(x)dx = c and f, f = 0 ff f 0 snce c > 0 for each f 2 (x)dx 0 As mentoned before, a coeffcent c represents the probablty (or a weght) for the th pont query (q ) to appear. otce that the answer of whch s the th data value, whch s functon value at the th nterval. When all coeffcents c are equal to 1 (a unform dstrbuton of queres), we get the standard nner product, and therefore ths s a generalzaton of the standard nner product. 11

12 4.3 Defnng a norm based on the nner product Based on that nner product we defne an nner-product-based (IPB) norm: f IP B = f, f (3) Lemma 2 The norm f IP B measured over the vector of absolute errors s the weghted L 2 norm of ths vector,.e e 2 IP B = 1 c e 2 = e 2 w. Proof : Let f V j be a functon and let f V j be a functon that approxmates f. let the error functon be e = f f V j. Then the norm of the error functon s: 1 ( c f 1 e 2 IP B = e, e = c 1 c ( ) f ( 1 e 2 (x) dx = )) 2 = 1 1 c c (f e (x) e (x) dx = ( f f ) 2 (x) dx = ( ) ( )) 2 1 f = c e 2 where e s the error on the th functon value. Ths s exactly the square of the prevously defned weghted L 2 norm. otce that when all coeffcents are equal to 1 we get the regular L 2 norm, and therefore ths s a generalzaton of the regular L 2 norm (MSE). Our purpose s to mnmze the workload based error whch s the W L 2 norm of the vector of errors. 4.4 Defnng an orthonormal bass At ths stage we would lke to use Thm. 1. The next step would thus be fndng an orthonormal (wth respect to a workload based nner product) wavelet bass for the space V j. The bass s a Weghted Haar Bass. For each workload-based nner product (defned by a gven query workload) there s correspondng orthonormal weghted Haar bass, and our algorthm fnds ths bass n lnear tme, gven the workload of pont queres. We descrbe the bases here, and see how to fnd a bass based on a gven workload of pont queres. We wll later use ths nformaton n the algorthmc part. In order to buld a weghted Haar bass, we take the Haar bass functons and for the k th bass functon we multply ts postve (resp. negatve) part by some x k (resp. x k ). We would lke to choose such x k and y k so that we get an orthonormal bass wth respect to our nner product. Let us llustrate t by drawng. Instead of usng Haar bass functons (Fg. 2), we use functons of the knd llustrated n Fg. 3, where x k and y k are not necessarly (and probably not) equal, so our bass looks lke the one n (Fg. 4). How do we choose x k and y k? Let u k be some Haar bass functon as descrbed above. Let [a k0, a k1 ) be the nterval over whch the bass functon s postve and let [a k1, a k2 ) be the nterval over whch the functon s negatve. 1 Recall that a k0, a k1 and a k2 are both multples of and therefore the nterval precsely contans some number of contnuous ntervals of the form [, +1 ] (also a k 1 = a k 0 +a k2 2 ). Moreover, the sze of the nterval over whch the functon s postve (resp. negatve) s 1 for some < j (As 2 we remember, = 2 j ). Recall that for the th nterval of sze 1, meanng [, +1 ) there s a 12

13 Fgure 2: An example for a Haar bass functon correspondng weght coeffcent c whch s the coeffcent that s used n the nner product. otce that each Haar bass functon s postve (negatve) over some number of (whole) such ntervals. We can therefore assocate the sum of coeffcents of the ntervals under the postve (negatve) part of the functon wth the postve (negatve) part of the functon. Let us denote the sum of weght coeffcents (c s) correspondng to ntervals that are under the postve (resp. negatve) as l k (resp. r k ). Lemma 3 Suppose for each Haar bass functon v k we choose x k and y k such that x k = rk l k r k + l 2 k y k = lk l k r k + r 2 k and multply the postve (resp. negatve) part of v k by x k (resp. y k ); by dong that we get an orthonormal set of = 2 j functons, meanng we get an orthonormal bass. Proof : We frst show that when takng x k and y k such that x k r k = y k lk the bass s orthogonal. It s enough to show that the nner product of any v k and a constant functon s 0. In order to see why that suffces: Let u and v be some 2 Haar bass functons and let I u and I v be the ntervals over whch u and v are dfferent from zero, respectvely. If there s some pont (nterval) over whch both functons are dfferent from zero, then by the Haar bass defnton we get ether I u I v or I v I u. Suppose I v I u then I v s contaned only n the negatve part of I u or only n the postve part of I u, agan, by the Haar bass defnton. Consequently, when multplyng u and v by an nner product, there are two possble results: ether there s no pont that both functons are dfferent from zero, or the non-zero nterval of one functon s completely contaned n a constant part of the other functon. Obvously ths goes for our Weghted Haar Bass as well. ow, let us verfy that the nner product of some v k wth a constant functon f (x) = m s zero: 1 v k, f = c m { v k( )>0} c 1 v k (x) f (x) dx = 1 m c v k (x) dx + m 13 v k (x) dx = { v k( )<0} c c v k (x) mdx = v k (x) dx =

14 Fgure 3: An example for a Weghted Haar Bass functon Fgure 4: the weghted Haar Bass along wth the workload coeffcents, each coeffcent under ts correspondng nterval. For each level, the functons of the level are dfferent from zero over ntervals of equal sze. m xk m { v k( )>0} { v k( )>0} c m yk c x k m { v k( )<0} { v k( )<0} c y k = c = m (x k l k y k r k ) = 0 ow, n order to get an orthonormal bass all we have to do s to normalze those bass functons. 14

15 Let us compute the norm of some v k whose postve part s set to x k and ts negatve part s set to y k : 1 v k, v k = c v k 2 (x) dx = { v k( )>0} x2 k c { v k( )>0} { v k( )>0} v 2 k (x) dx + c x 2 k + c + y2 k { v k( )<0} { v k( )<0} { v k( )<0} From the orthogonalty condton we wll take y k = x kl k r k : r k c c y 2 k = c = x 2 kl k + y 2 kr k ( ) x 2 kl k + ykr 2 k = 1 x 2 xk l 2 k kl k + r k = 1 x 2 kl k + x2 k l2 k x 2 k So we wll take: ( ) l k + l2 k rk 2 = 1 x 2 1 k = x k = 1 l k + l2 k r 2 l k + l2 k k x k = rk l k r k + l 2 k y k = lk l k r k + r 2 k r 2 k = v 2 k (x) dx = r k = 1 rk l k r k + l 2 k There s a specal case whch s the computng of the constant bass functon (whch represents the total weghted average) v 0 (x) = const. We would lke the norm of ths functon to be 1. We just have to put x k = y k n the equaton x 2 k l k + yk 2r k = 1 and get f (x) = x k = y k = 1 l k +r k = const. Agan, notce that had all the workload coeffcents been equal (c = 1 ) we would get the standard Haar bass used to mnmze the standard L 2 norm. Agan, notce that had all the workload coeffcents been equal (c = 1 ) we would get the standard Haar bass used to mnmze the standard L 2 norm. As we have seen, ths s an orthonormal bass to our functon space. In order to see that t s a wavelet bass, we can notce that for each k = 1,..., j, the frst 2 k functons are an orthonormal set belongng to V k (ts dmenson s 2 k ) and whch s therefore a bass of V k. 5 The algorthm for WWW transform In ths secton we descrbe the algorthmc part. Gven a workload of pont queres and a data vector to be approxmated, we buld workload-based wavelet synopses of the data vector usng a weghted Haar bass. The algorthm has two parts: 1. Computng effcently a Weghted Haar bass, gven a workload of pont queres. (Sec. 5.1) 2. Computng effcently the Weghted Haar Wavelet Transform wth respect to the chosen bass. (Sec. 5.2) 15

16 5.1 Computng effcently a weghted Haar bass ote that at ths pont we already have a method to fnd an orthonormal bass wth respect to a gven workload based nner product. Recall that n order to know x k and y k for every bass functon we need to know the correspondng l k and r k. We are gong to compute all those partal sums n lnear tme. Suppose that the bass functons are arranged n an array lke n a bnary tree representaton. The hghest resoluton functons are at ndexes 2,..., 1, whch are the lowest level of the tree. The next resoluton level functons are at ndexes 4,..., 2 1, and so on, untl the constant bass functon s n ndex 0. otce that for the lowest level (hghest resoluton) functons (ndexes 2,..., 1) we already have ther l k s and r k s. These are exactly the workload coeffcents. It can easly be seen n Fg. 4 for the lower four functons. otce that after computng the accumulated sums for the functons at resoluton level, we have all the nformaton to compute the hgher level functons: let u k be a functon at resoluton level and u 2k, u 2k+1 be at level + 1, where ther supports ncluded n u k s support (u k s ther ancestor n the bnary tree of functons). We can use the followng formula for computng l k and r k : l k = l 2k + r 2k r k = l 2k+1 + r 2k+1 It can be seen n the example of Fg. 4. Thus, we can compute n one pass only the lowest level, and buld the upper levels bottom-up (n a way somewhat smlar to the Haar wavelet transform). At the end of a phase n the algorthm (a phase would be computng the functons of a specfc level) we would keep a temporary array holdng all the parwse sums of all the l k s and r k s from that phase and use them for computng the next phase functons. Clearly, the runnng tme s = O (). The number of I/Os s O (/B) I/Os (where B s the block sze of the dsk) snce the process s smlar to the computaton Haar wavelet transform. A pseudo-code of the computaton can be found n Fg. 14. The createf uncton() functon takes two sums of weght coeffcents correspondng to the functon s postve part and to the functon s negatve part, and buld a functon whose postve (resp. negatve) part s value s x k (resp. y k ) usng the followng formulae: x k = rk l k r k + l 2 k y k = lk l k r k + r 2 k 5.2 Computng a weghted Haar wavelet transform Gven the bass we would lke to effcently perform the wavelet transform wth respect to that bass. Let us look at the case of = 2 (Fg. 5). Suppose we would lke to represent the functon n Fg. 6. It s easy to compute the followng result (denote α as the coeffcent of f ): α 0 = yv 0 + xv 1 x + y α 1 = v 0 v 1 x + y (by solvng 2x2 matrx). otce that the coeffcents are weghted averages and dfferences, snce the transform generalzes the standard Haar transform (by takng x = y = 2 we get the standard Haar transform). It s easy to reconstruct the orgnal functon from the coeffcents: v 0 = α 0 + xα 1 v 1 = α 0 yα 1 Ths mples a straghtforward method to compute the wavelet transform (whch s I/O effcent as well) accordng to the way we compute a regular wavelet transform wth respect to the Haar 16

17 Fgure 5: Weghted Haar Transform wth two functons Fgure 6: a smple functon wth 2 values over [0, 1) bass: we go over the data, and compute the weghted dfferences whch are the coeffcents of the bottom level functons. We keep the weghted averages, whch can be represented solely by the rest of the bass functons (the lower resoluton functons - as n the regular Haar wavelet transform), n another array. We repeat the process over the averages tme and tme agan untl we have the overall average, whch s added to our array as the coeffcent of the constant functon (v 0 (x) = const). Whle computng the transform, n addton to readng the values of the sgnal, we need to read the proper bass functon that s relevant for the current stage (n order to use the x k and y k of the functon that s employed n the above formula). Ths s easy to do, snce all the functons are stored n an array F and the ndex of a functon s determned by the teraton number and s dentcal to the ndex of the correspondng currently computed coeffcent. A pseudo code of the algorthm s can be found n Fg. 15. As we know, the Haar wavelet transform s a lnear algorthm. The steps of our algorthm are dentcal to the steps of the Haar algorthm, wth the addton of readng the data at F [] (the x k and y k of the functon) durng the th teraton. Therefore the I/O complexty of that phase remans O (/B) (B s the dsk block sze) wth O () runnng tme. After havng the coeffcent of the orthonormal bass we would keep the largest M coeffcents, along wth ther correspondng M functons, and throw the smallest coeffcents relyng on Thm. 1 We can do t n lnear tme usng the M-approxmate quantle algorthm [13]. 6 Optmal synopss for mean relatve error We next show a varant of the weghted-wavelets-based algorthm mnmzes the weghted L 2 norm of the vector of relatve errors, weghted by the query workload, usng weghted wavelets. We demonstrate another use of gvng weghts to data values, used to mnmze the mean-squaredrelatve-error measured over the data values. Recall that n order to mnmze the weghted L 2 norm of relatve errors, we need to mnmze ( ) 2 =1 d c ˆd d (actually ( 2 d =1 c ˆd max{d,s}), but the dea s the same). Snce D = d 1,..., d s part of the nput of the algorthm, t s fxed throughout the algorthm s executon. We can thus 17

18 ( dvde each c by d 2 and get a new vector of weghts: W = c 1,..., c d 2 1 d 2 ) ( d ˆd results, and usng the new vector of weghts we mnmze c =1 d 2 whch s the W L 2 norm of relatve errors. otce that n the case b = 1. Relyng on our prevous ) ( ) 2 2 = =1 d c ˆd d, (the unform case) the algorthm mnmzes the mean-relatve-error over all data values. As far as we know, ths s the frst algorthm that mnmzes the mean-relatve-error over the data values. 7 Experments In ths secton we demonstrate the advantage obtaned by our workload-based wavelet synopses. All our experments were done usng the τ-synopses [15] system. For our expermental studes we used both synthetc and real-lfe data sets. The synthetc data-sets are taken from the TPCH data ( and the real-lfe data-sets are taken from the Forest CoverType data provded by KDD Data of the Unversty of Calforna ( The data-sets are: 1. TPCH - 2. KDD - TPCH1 - Data attrbute 1 from table ORDERS, fltered by attrbute O CUSTKEY, whch contans about 150,000 dstnct values. KDD Data attrbute Aspect from table CovTypeAgr fltered by Elevaton from the KDD data, wth a total of 2048 dstnct values. The sets of queres were generated ndependently by a Zpf dstrbuton generator. We used queres of dfferent skews, dstrbuted by several Zpf parameter values. We took here the zpf parameters 0.2, 0.5 and 0.8, n order to test the behavor of the synopses under dfferent skews, whch range from close-to-unform to hghly skewed. The sets of queres contaned queres over each data set. In Fg. 7 we compared the standard wavelet synopss from [16] wth our WB-MSE wavelet synopss. The standard synopss s depcted n sold lne. We measured the WB-MSE as a functon of synopss sze, measured as the number of coeffcents n the synopss. For each M = 10, 20,..., 100 we bult synopses of sze M usng both methods and compared the WB-MSE error, measured wth respect to a gven workload of queres. The workload contaned 5000 Zpf dstrbuted pont queres, wth a Zpf parameter of 0.5. The data-set was the TPCH1 data. As the synopss sze ncreases, the error of the workload-based algorthm becomes much smaller than the error of the standard algorthm. The reason for ths s that synopses of szes 10,...,100 are very small wth respect to a data of sze 150,000. Snce the standard algorthm does not take the query workload nto account, the results are more or less the same for all synopses szes n the experment. However, the workload-based synopss adapts tself to the query workload, whch s of sze All the data values whch are not quered by the workload are gven very small mportance weghts, so the synopss actually has to be accurate over less than 5000 values. Thus, there s a sharp decrease n the error of the workload-based algorthm as the synopss sze ncreases. In Fg. 8 we used a smlar experment, ths tme wth the KDD2048 data. The standard synopss s agan depcted n sold lne. As n the prevous experment, we measured the WB-MSE as a functon of synopss sze. For each M = 20, 40,..., 200 we bult synopses of sze M usng both 18

19 methods and compared the WB-MSE error, measured wth respect to a gven workload of queres. The workload contaned 5000 Zpf dstrbuted pont queres, wth a Zpf parameter 0.5. The data was the KDD2048 data, of sze We see that for each synopss sze the error of the standard algorthm s approxmately twce the error of the workload-based algorthm. The reason for ths s that here the query workload s larger than the data-set, n contrast to the prevous experment. Thus, most of the data s quered by the workload, so the mportance weghts gven to data values are more unform than n the prevous experment. Therefore, the error dfference s smaller than n the prevous experment, snce the advantage of the workload-based algorthm becomes more sgnfcant as the workload gets more skewed. However, snce the workload-based synopss adapts tself to the workload, the error s stll better than the standard synopss, whch assumes unform dstrbuton. In Fg. 9 we compared the standard wavelet synopss from [16] and the adaptve-greedy workloadbased wavelet synopss from [14] wth our WB-MRE wavelet synopss. The standard synopss s depcted n dotted lne wth x s. Snce t s hard to dstngush between the other two synopses n ths resoluton level, we zoom nto ths fgure n Fg. 10. We measured the WB-MRE as a functon of synopss sze, measured as the number of coeffcents n the synopss. For each M = 20, 40,..., 200 we bult synopses of sze M usng the three methods and compared the WB-MRE error, measured wth respect to a gven workload of queres. The workload contaned 3000 Zpf dstrbuted pont queres, wth a Zpf parameter of 0.5. The data-set was the KDD2048 data. Snce the standard algorthm does not take nto account the query-workload and s not adapted for relatve errors, ts approxmaton error s more than tmes larger than the approxmaton errors of the workloadbased algorthms, for each synopss sze. In Fg. 10 we compare the adaptve-greedy workload-based synopss from [16] wth our WB- MRE synopss. The adaptve-greedy synopss s depcted n sold lne. We measured the WB-MRE as a functon of synopss sze, measured as the number of coeffcents n the synopss. For each M = 20, 40,..., 200 we bult synopses of sze M usng the two methods and compared the WB-MRE error, measured wth respect to a gven workload of queres. The workload contaned 5000 Zpf dstrbuted pont queres, wth a Zpf parameter of 0.5. The data-set was the KDD2048 data. For each synopss sze, the approxmaton error of the adaptve-greedy s tmes larger than the error of our WB-MRE algorthm. In Fg. 11 we depct the WB-MRE as a functon of synopss sze, for three gven query workloads, dstrbuted wth Zpf parameters 0.2, 0.5 and 0.8. The data-set was the KDD2048 data-set, and the workloads conssted 5000 queres. For each of the gven three workloads we buld synopses of sze M = 50, 100,..., 500 and depcted the WB-MRE as a functon of synopss sze. It can be seen that many wavelet coeffcents can be gnored before the error sgnfcantly ncreases. Ths s a desred feature for any synopss. For example, for synopses of sze 500 the WB-MRE s smaller than 0.05, and for synopses of sze 250 the WB-MRE s smaller than 0.1. It can also be seen that the hgher the skew, the more accurate the workload-based synopses. The reason s that when the skew gets hgher, the synopss should be accurate over a smaller number of data values. In Fg. 12 we compare the standard algorthm from [14] wth our WB-MRE algorthm n a dfferent way than before. We compare the rato between the approxmaton error of the standard algorthm and the approxmaton error of the WB-MRE algorthm, for dfferent workload skews. The comparson was done for three dfferent query workloads, dstrbuted wth dfferent Zpf parameters. The workloads contaned 5000 queres, dstrbuted wth Zpf parameters 0.2, 0.5 and 0.8 respectvely. The data-set was the KDD2048. For each gven workload we measured the error rato between the two synopses, for each synopses sze M = 50, 100, It s clearly seen 19

20 that the hgher the skew of the workload, the hgher the rato between the approxmaton errors of the synopses. The reason s than as the workload gets far from unform, the advantage of the workload-based algorthms naturally becomes more sgnfcant over the standard synopss, whch assumes unform workload. In Fg. 13 we show the robustness of the workload-based wavelet synopses to devatons from predefned workload. The experment addresses the problem of ncorrect future workload estmaton. When buldng our synopss, we assumed the queres would be dstrbuted as Zpf(0.2). We fxed the synopss sze, and bult our synopss. We then used the synopss to answer query workloads dstrbuted dfferent than expected, e.g wth Zpf parameters 0.3,0.4,... etc. The fgure depcts the WB-MRE as a functon of the dfference between the actual query dstrbuton and our estmated query dstrbuton (estmated as Zpf(0.2)). The skew dfference s the dfference between the actual Zpf parameter and the estmated Zpf parameter, accordng to whch we assumed the queres are dstrbuted. We show that small errors n the workload estmaton ntroduce only small errors n the qualty of the approxmaton, and that the error grows contnuously as the devaton from pre-defned workload ncreases. 8 Conclusons In ths paper we ntroduce the use of weghted wavelets for buldng optmal workload-based wavelet synopses. We present two tme-optmal and I/O-optmal algorthms for workload-based wavelet synopses, whch mnmze the WB-MSE and and the WB-MRE error measures, wth respect to any gven query workload. The advantage of optmal workload-based wavelet synopses, as well as ther robustness, were demonstrated by expermentatons. Recently, and ndependently of our work, Muthukrshnan [17] presented an optmal workloadbased wavelet synopss wth respect to the standard Haar bass. The algorthm for buldng the optmal synopss s based on dynamc programmng and takes O( 2 M/ log M) tme. As noted above, standard Haar bass s not orthonormal w.r.t. the workload-based error metrc, and an optmal synopss w.r.t. ths bass s not necessarly also an optmal enhanced wavelet synopss. Obtanng optmal enhanced wavelet synopses for the standard Haar wavelets may be an nterestng open problem. Also, as quadratc tme s too costly for massve data sets, t may be nterestng to obtan a tme effcent algorthm for such synopses. As far as approxmaton error s concerned, although n general optmal synopses w.r.t. the standard Haar and a weghted Haar bases are ncomparable, both bases have the same characterstcs. It would be nterestng to compare the actual approxmaton errors of the two synopses for varous data sets. Ths may ndeed be the subject of a future work. Acknowledgments: We thank Leon Portman for helpful dscussons and for hs assstance n settng up the experments on the τ-synopses system. We also thank Prof. ra Dyn for helpful dscussons regardng the wavelets theory. 20

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2 Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space. Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS BOUNDEDNESS OF THE IESZ TANSFOM WITH MATIX A WEIGHTS Introducton Let L = L ( n, be the functon space wth norm (ˆ f L = f(x C dx d < For a d d matrx valued functon W : wth W (x postve sem-defnte for all

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

General theory of fuzzy connectedness segmentations: reconciliation of two tracks of FC theory

General theory of fuzzy connectedness segmentations: reconciliation of two tracks of FC theory General theory of fuzzy connectedness segmentatons: reconclaton of two tracks of FC theory Krzysztof Chrs Ceselsk Department of Mathematcs, West Vrgna Unversty and MIPG, Department of Radology, Unversty

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Lecture 2: Gram-Schmidt Vectors and the LLL Algorithm

Lecture 2: Gram-Schmidt Vectors and the LLL Algorithm NYU, Fall 2016 Lattces Mn Course Lecture 2: Gram-Schmdt Vectors and the LLL Algorthm Lecturer: Noah Stephens-Davdowtz 2.1 The Shortest Vector Problem In our last lecture, we consdered short solutons to

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Chapter 8 SCALAR QUANTIZATION

Chapter 8 SCALAR QUANTIZATION Outlne Chapter 8 SCALAR QUANTIZATION Yeuan-Kuen Lee [ CU, CSIE ] 8.1 Overvew 8. Introducton 8.4 Unform Quantzer 8.5 Adaptve Quantzaton 8.6 Nonunform Quantzaton 8.7 Entropy-Coded Quantzaton Ch 8 Scalar

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING N. Phanthuna 1,2, F. Cheevasuvt 2 and S. Chtwong 2 1 Department of Electrcal Engneerng, Faculty of Engneerng Rajamangala

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Inductance Calculation for Conductors of Arbitrary Shape

Inductance Calculation for Conductors of Arbitrary Shape CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system. Chapter Matlab Exercses Chapter Matlab Exercses. Consder the lnear system of Example n Secton.. x x x y z y y z (a) Use the MATLAB command rref to solve the system. (b) Let A be the coeffcent matrx and

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem H.K. Pathak et. al. / (IJCSE) Internatonal Journal on Computer Scence and Engneerng Speedng up Computaton of Scalar Multplcaton n Ellptc Curve Cryptosystem H. K. Pathak Manju Sangh S.o.S n Computer scence

More information

Affine transformations and convexity

Affine transformations and convexity Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013 ISSN: 2277-375 Constructon of Trend Free Run Orders for Orthogonal rrays Usng Codes bstract: Sometmes when the expermental runs are carred out n a tme order sequence, the response can depend on the run

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

An Interactive Optimisation Tool for Allocation Problems

An Interactive Optimisation Tool for Allocation Problems An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents

More information

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

5 The Rational Canonical Form

5 The Rational Canonical Form 5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Graph Reconstruction by Permutations

Graph Reconstruction by Permutations Graph Reconstructon by Permutatons Perre Ille and Wllam Kocay* Insttut de Mathémathques de Lumny CNRS UMR 6206 163 avenue de Lumny, Case 907 13288 Marselle Cedex 9, France e-mal: lle@ml.unv-mrs.fr Computer

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

HMMT February 2016 February 20, 2016

HMMT February 2016 February 20, 2016 HMMT February 016 February 0, 016 Combnatorcs 1. For postve ntegers n, let S n be the set of ntegers x such that n dstnct lnes, no three concurrent, can dvde a plane nto x regons (for example, S = {3,

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Introduction to information theory and data compression

Introduction to information theory and data compression Introducton to nformaton theory and data compresson Adel Magra, Emma Gouné, Irène Woo March 8, 207 Ths s the augmented transcrpt of a lecture gven by Luc Devroye on March 9th 207 for a Data Structures

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Queueing Networks II Network Performance

Queueing Networks II Network Performance Queueng Networks II Network Performance Davd Tpper Assocate Professor Graduate Telecommuncatons and Networkng Program Unversty of Pttsburgh Sldes 6 Networks of Queues Many communcaton systems must be modeled

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department

More information

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be

More information

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics ) Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton

More information