Edinburgh Research Explorer

Size: px
Start display at page:

Download "Edinburgh Research Explorer"

Transcription

1 Edinburgh Research Explrer Supprting User-Defined Functins n Uncertain Data Citatin fr published versin: Tran, TTL, Dia, Y, Suttn, CA & Liu, A 23, 'Supprting User-Defined Functins n Uncertain Data' Prceedings f the VLDB Endwment (PVLDB), vl. 6, n. 6, pp Link: Link t publicatin recrd in Edinburgh Research Explrer Dcument Versin: Publisher's PDF, als knwn as Versin f recrd Published In: Prceedings f the VLDB Endwment (PVLDB) General rights Cpyright fr the publicatins made accessible via the Edinburgh Research Explrer is retained by the authr(s) and / r ther cpyright wners and it is a cnditin f accessing these publicatins that users recgnise and abide by the legal requirements assciated with these rights. Take dwn plicy The University f Edinburgh has made every reasnable effrt t ensure that Edinburgh Research Explrer cntent cmplies with UK legislatin. If yu believe that the public display f this file breaches cpyright please cntact penaccess@ed.ac.uk prviding details, and we will remve access t the wrk immediately and investigate yur claim. Dwnlad date: 2. Apr. 29

2 Supprting User-Defined Functins n Uncertain Data Thanh T. L. Tran, Yanlei Dia, Charles Suttn, Anna Liu University f Massachusetts, Amherst University f Edinburgh {ttran,yanlei}@cs.umass.edu csuttn@inf.ed.ac.uk anna@math.umass.edu ABSTRACT Uncertain data management has becme crucial in many sensing and scientific applicatins. As user-defined functins (UDFs) becme widely used in these applicatins, an imprtant task is t capture result uncertainty fr queries that evaluate UDFs n uncertain data. In this wrk, we prvide a general framewrk fr supprting UDFs n uncertain data. Specifically, we prpse a learning apprach based n Gaussian prcesses (GPs) t cmpute apprximate utput distributins f a UDF when evaluated n uncertain input, with guaranteed errr bunds. We als devise an nline algrithm t cmpute such utput distributins, which emplys a suite f ptimizatins t imprve accuracy and perfrmance. Our evaluatin using bth real-wrld and synthetic functins shws that ur prpsed GP apprach can utperfrm the state-f-the-art sampling apprach with up t tw rders f magnitude imprvement fr a variety f UDFs.. INTRODUCTION Uncertain data management has becme crucial in many applicatins including sensr netwrks [9], bject tracking and mnitring [22], severe weather mnitring [3], and digital sky surveys [2]. Data uncertainty arises due t a variety f reasns such as measurement errrs, incmplete bservatins, and using inference t recver missing infrmatin [22]. When such data is prcessed via queries, its uncertainty prpagates t prcessed results. The ability t capture the result uncertainty is imprtant t the end user fr interpreting the derived infrmatin apprpriately. Fr example, knwing nly the mean f a distributin fr the result cannt distinguish between a sure event and a highly uncertain event, which may result in wrng decisin making; knwing mre abut the distributin can help the user avid such misunderstanding and ill-infrmed actins. Recent wrk n uncertain data management has studied intensively relatinal query prcessing n uncertain data (e.g., [4, 7, 6, 9, 2, 23]). Our wrk, hwever, is mtivated by the bservatin that real-wrld applicatins, such as scientific cmputing and financial analysis, make intensive use f user-defined functins (UDFs) that prcess and analyze the data using cmplex, dmain-specific algrithms. In practice, UDFs can be prvided in any frm f external cde, e.g., C prgrams, and hence are treated mainly as black Permissin t make digital r hard cpies f all r part f this wrk fr persnal r classrm use is granted withut fee prvided that cpies are nt made r distributed fr prfit r cmmercial advantage and that cpies bear this ntice and the full citatin n the first page. T cpy therwise, t republish, t pst n servers r t redistribute t lists, requires prir specific permissin and/r a fee. Articles frm this vlume were invited t present their results at The 39th Internatinal Cnference n Very Large Data Bases, August 26th - 3th 23, Riva del Garda, Trent, Italy. Prceedings f the VLDB Endwment, Vl. 6, N. 6 Cpyright 23 VLDB Endwment /3/4... $.. bxes in traditinal databases. These UDFs are ften expensive t cmpute due t the cmplexity f prcessing. Unfrtunately, the supprt fr UDFs n uncertain data is largely lacking in tday s data management systems. Cnsequently, in the trnad detectin applicatin [3], detectin errrs cannt be distinguished frm true events due t the lack f assciated cnfidence scres. In ther applicatins such as cmputatinal astrphysics [2], the burden f characterizing UDF result uncertainty is impsed n the prgrammers: we bserved that the prgrammers f the Slan digital sky surveys manually cde algrithms t keep track f uncertainty in a number f UDFs. These bservatins have mtivated us t prvide system supprt t autmatically capture result uncertainty f UDFs, hence freeing users frm the burden f ding s and returning valuable infrmatin fr interpreting query results apprpriately. Mre cncretely, let us cnsider tw examples f UDFs in the Slan digital sky surveys (SDSS) [2]. In SDSS, nightly bservatins f stars and galaxies are inherently nisy as the bjects can be t dim t be recgnized in a single image. Hwever, repeated bservatins allw the scientists t mdel the psitin, brightness, and clr f bjects using cntinuus distributins, which are cmmnly Gaussian distributins. Assume the prcessed data is represented as (bjid, ps p, redshift p,...) where ps and redshift are uncertain attributes. Then, queries can be issued t detect features r prperties f the bjects. We cnsider sme example UDFs frm an astrphysics package []. Query Q belw cmputes the age f each galaxy given its redshift using the UDF GalAge. Since redshift is uncertain, the utput GalAge(redshift) is als uncertain. Q: Select G.bjID, GalAge(G.redshift) Frm Galaxy G A mre cmplex example f using UDFs is shwn in query Q2, which cmputes the cmving vlume f tw galaxies whse distance is in sme specific range. This query invkes tw UDFs CmveV l and Distance n uncertain attributes redshift and ps respectively, tgether with a selectin predicate n the utput f the UDF Distance. Q2: Select G.bjID, G2.bjID, Distance(G.ps, G2.ps) CmveVl(G.redshift, G2.redshift, A R E A) Frm Galaxy AS G, Galaxy AS G2 Where Distance(G.ps, G2.ps) [l, u] Prblem Statement. In this wrk, we aim t prvide a general framewrk t supprt UDFs n uncertain data, where the functins are given as black bxes. Specifically, given an input tuple mdeled by a vectr f randm variables X, which is characterized by a jint distributin (either cntinuus r discrete), and a univariate, black-bx UDF f, ur bjective is t characterize the distributin f Y = f(x). In the example f Q2, after the jin between G and G 2, each tuple carries a randm vectr, X = {G.ps, G.redshift, G 2.ps, G 2.redshift,...}, and 469

3 tw UDFs prduce Y = Distance(G.ps, G 2.ps) and Y 2 = CmveV l(g.redshift, G 2.redshift, AREA). Given the nature f ur UDFs, exact derivatin f result distributins may nt be feasible, and hence apprximatin techniques will be explred. A related requirement is that the prpsed slutin must be able t meet user-specified accuracy gals. In additin, the prpsed slutin must be able t perfrm efficiently in an nline fashin, fr example, t supprt nline interactive analysis ver a large data set r data prcessing n real-time streams (e.g., t detect trnads r anmalies in sky surveys). Challenges. Supprting UDFs as stated abve pses a number f challenges: () UDFs are ften cmputatinally expensive. Fr such UDFs, any prcessing that incurs repeated functin evaluatin t cmpute the utput will take a lng time t cmplete. (2) When an input tuple has uncertain values, cmputing a UDF n them will prduce a result with uncertainty, which is characterized by a distributin. Cmputing the result distributin, even when the functin is knwn, is a nn-trivial prblem. Existing wrk in statistical machine learning (surveyed in [5]) uses regressin t estimate a functin, but mstly fcuses n deterministic input. Fr uncertain input, existing wrk [2] cmputes nly the mean and variance f the result, instead f the full distributin, and hence is f limited use if this distributin is nt Gaussian (which is ften the case). Other wrk [5] cmputes apprximate result distributins withut bunding apprximatin errrs, thus nt addressing user accuracy requirements. (3) Further, mst f ur target applicatins require using an nline algrithm t characterize result uncertainty f a UDF, where nline means that the algrithm des nt need an ffline training phase befre prcessing data. Relevant machine learning techniques such as [2, 5] belng t ffline algrithms. In additin, a desirable nline algrithm shuld perate with high perfrmance in rder t supprt nline interactive analysis r data stream prcessing. Cntributins. In this paper, we present a cmplete framewrk fr handling user-defined functins n uncertain data. Specifically, ur main cntributins include:. An apprximate evaluatin framewrk ( 2): We prpse a carefully-crafted apprximatin framewrk fr cmputing UDFs n uncertain data, including apprximatin metrics and bjectives. These metrics, namely discrepancy and KS measures, are a natural fit f range queries and intuitive t interpret. While many apprximatin metrics exist in the statistics literature, ur chices f the metrics and bjectives cmbined allw us t prvide new theretical results regarding the errr bunds f utput distributins. 2. Cmputing utput distributins with errr bunds ( 3 and 4): We emply an apprach f mdeling black-bx UDFs using a machine learning technique called Gaussian prcesses (GPs). We chse this technique due t its abilities t mdel functins and quantify the apprximatin in such functin mdeling. Given the GP mdel f a UDF and uncertain input, ur cntributin lies in cmputing utput distributins with errr bunds. In particular, we prvide an algrithm that cmbines the GP mdel f a UDF and Mnte Carl (MC) sampling t cmpute utput distributins. We perfrm an in-depth analysis f the algrithm and derive new theretical results fr quantifying the apprximatin f the utput, including bunding the errrs f bth apprximatin f the UDF and sampling frm input distributins. These errr bunds can be used t tune ur mdel t meet accuracy requirements. T the best f ur knwledge, this wrk is the first t quantify utput distributins f Gaussian prcesses. 3. An ptimized nline algrithm ( 5): We further prpse an nline algrithm t cmpute apprximate utput distributins that satisfy user accuracy requirements. Our algrithm emplys a suite f ptimizatins f the GP learning and inference mdules t imprve perfrmance and accuracy. Specifically, we prpse lcal inference t increase inference speed while maintaining high accuracy, nline tuning t refine functin mdeling and adapt t input data, and an nline retraining strategy t minimize the training verhead. Existing wrk in machine learning [2, 5, 4, 7] des nt prvide a sufficient slutin t such high-perfrmance nline training and inference while meeting user-specified accuracy requirements. 4. Evaluatin ( 6): We cnduct a thrugh evaluatin f ur prpsed techniques using bth synthetic functins with cntrlled prperties, and real functins frm the astrphysics dmain. Results shw that ur GP techniques can adapt t varius functin cmplexities, data characteristics, and user accuracy gals. Cmpared t MC sampling, ur apprach starts t utperfrm when functin evaluatin takes mre than ms fr lw-dimensinal functins, e.g., up t 2 dimensins, r when functin evaluatin takes mre than ms fr high-dimensinal nes, e.g., dimensins. This result applies t real-wrld expensive functins as we shw using the real UDFs frm astrphysics. Fr the UDFs tested, the GP apprach can ffer up t tw rders f magnitude speedup ver MC sampling. 2. AN APPROXIMATION FRAMEWORK In this sectin, we first prpse a general apprximate evaluatin framewrk, and then present a baseline apprach based n Mnte Carl sampling t cmpute utput distributins f UDFs. 2. Apprximatin Metrics and Objectives Since UDFs are given as black bxes and have n explicit frmula, cmputing the utput f the UDFs can be dne nly thrugh functin evaluatin. Fr uncertain input, cmputing the exact distributin requires functin evaluatin at all pssible input values, which is impssible when the input is cntinuus. In this wrk, we seek apprximatin algrithms t cmpute the utput distributin given uncertain input. We nw present ur apprximatin framewrk including accuracy metrics and bjectives. We adpt tw distance metrics between randm variables frm the statistics literature []: the discrepancy and Klmgrv Smirnv (KS) measures. We chse these metrics because they are a natural fit f range queries, hence allwing easy interpretatin f the utput. Definitin Discrepancy measure. The discrepancy measure, D, between tw randm variables Y and Y is defined as: D(Y, Y ) = sup a,b:a b Pr[Y [a, b]] Pr[Y [a, b]]. Definitin 2 KS measure. The KS measure (r distance) between tw randm variables Y and Y is defined as: KS(Y, Y ) = sup y Pr[Y y] Pr[Y y]. The values f bth measures are in [, ]. It is straightfrward t shw that D(Y, Y ) 2KS(Y, Y ). Bth measures can be cmputed directly frm the cumulative distributin functins (CDFs) f Y and Y t capture their maximum difference. The KS distance cnsiders all ne-sided intervals, i.e., [, c] r [c, ], while the discrepancy measure cnsiders all tw-sided intervals [a, b]. In practice, users may be interested nly in intervals f length at least λ, an applicatin-specific errr level that is tlerable fr the cmputed quantity. This suggests a relaxed variant f the discrepancy measure, as fllws. Definitin 3 λ-discrepancy. Given the minimum interval length λ, the discrepancy measure D λ between tw randm variables Y and Y is: D λ (Y, Y ) = sup a,b:b a λ Pr[Y [a, b]] Pr[Y [a, b]]. This measure can be interpreted as: fr all intervals f length at least λ, the prbability f an interval under Y des nt differ frm that 47

4 under Y by mre than D λ. These distance metrics can be used t indicate hw well ne randm variable Y apprximates anther randm variable Y. We next state the ur apprximatin bjective, (ɛ, δ)-apprximatin, using the discrepancy metric; similar definitins hld fr the λ-discrepancy and the KS metric. Definitin 4 (ɛ, δ)-apprximatin. Let Y and Y be tw randm variables. Then Y is an (ɛ, δ)-apprximatin f Y iff with prbability ( δ), D(Y, Y ) ɛ. Fr query Q, (ɛ, δ)-apprximatin requires that with prbability ( δ), the apprximate distributin f GalAge(G.redshift) des nt differ frm the true ne mre than ɛ in discrepancy. Fr Q2, there is a selectin predicate in the WHERE clause, which truncates the distributin f Distance(G.ps, G2.ps) t the regin [l, u], and hence yields a tuple existence prbability (TEP). Then, (ɛ, δ)-apprximatin requires that with prbability ( δ), (i) the apprximate distributin f Distance(G.ps, G2.ps) differs frm the true distributin by at mst ɛ in discrepancy measure, and (ii) the result TEP differs frm the true TEP by at mst ɛ. 2.2 A Baseline Apprach We nw present a simple, standard technique t cmpute the query results based n Mnte Carl (MC) simulatin. Hwever, as we will see, this apprach may require evaluating the UDF many times, which is inefficient fr slw UDFs. This inefficiency is the mtivatin fr ur new apprach presented in Sectins 3 5. A. Cmputing the Output Distributin. In recent wrk [23], we use Mnte Carl simulatin t cmpute the utput distributin f aggregates n uncertain input. This technique can als be used t cmpute any UDF Y = f(x). The idea is simple: draw the samples frm the input distributin, and perfrm functin evaluatin t get the utput samples. The algrithm is as fllws. Algrithm Mnte Carl simulatin : Draw m samples x... x m p(x). 2: Cmpute the utput samples, y = f(x ),..., y m = f(x m). 3: Return the empirical CDF f the utput samples, namely Y, Pr(Y y) = m i [..m] [y i, )(y). ( ) is the indicatr functin. It is shwn in [23] that if m = ln(2δ )/(2ɛ 2 ), then the utput Y is an (ɛ, δ)-apprximatin f Y in terms f KS measure, and (2ɛ, δ)-apprximate in terms f discrepancy measure. Thus, the number f samples required t reach the accuracy requirement ɛ is prprtinal t /ɛ 2, which is large fr small ɛ. Fr example, if we use the discrepancy measure and set ɛ =.2, δ =.5, then m required is mre than 8. B. Filtering with Selectin Predicates. In many applicatins, users are interested in the event that the utput is in certain intervals. This can be expressed with a selectin predicate, e.g., f(x) [a, b], as shwn in query Q2. When the prbability ρ = Pr[f(X) [a, b]] is smaller than a user-specified threshld θ, this crrespnds t an event f little interest and can be discarded. Fr high perfrmance, we wuld like t quickly check whether ρ < θ fr filtering, which in turn saves the cst frm cmputing the full distributin f(x). While drawing the samples as in Algrithm, we derive a cnfidence interval fr ρ t decide whether t filter. By definitin we have ρ = (a f(x) b)p(x)dx. Let h(x) = (a f(x) b) and m be the number f samples drawn s far ( m m). And let {h i i =... m} be the samples evaluated n h(x). Then, h i are iid, Bernulli samples, and ρ can be estimated by ρ, cmputed frm mi= h i the samples, ρ =. The fllwing result, which can be derived frm the Heffding s inequality in statistics, gives a cnfidence m interval fr ρ. Remark 2. With prbability ( δ), ρ [ ρ ɛ, ρ + ɛ], where ɛ = ln 2 2 m δ. If the user specifies a threshld θ t filter lw-prbability events, and ρ + ɛ < θ, then we can drp this tuple frm utput. 3. EMULATING UDFS WITH GAUSSIAN PROCESSES In the next three sectins, we present an apprach that aims t be mre efficient than MC sampling by requiring many fewer calls t the UDF. The main idea is that every time we call the UDF, we gain infrmatin abut the functin. Once we have called the UDF enugh times, we ught t be able t apprximate it by interplating between the knwn values t predict the UDF at unknwn values. We call this predictr an emulatr ˆf, which can be used in place f the riginal UDF f, and is much less expensive fr many UDFs. We briefly mentin hw t build the emulatr using a statistical learning apprach. The idea is that, if we have a set f functin input-utput pairs, we can use it as training data t estimate f. In principle, we culd build the emulatr using any regressin prcedure frm statistics r machine learning, but picking a simple methd like linear regressin wuld wrk prly n a UDF that did nt meet the strng assumptins f that methd. Instead, we build the emulatr using a learning apprach called Gaussian prcesses (GPs). GPs have tw key advantages. First, GPs are flexible methds that can represent a wide range f functins and d nt make strng assumptins abut the frm f f. Secnd, GPs prduce nt nly a predictin ˆf(x) fr any pint x but als a prbabilistic cnfidence that prvides errr bars n the predictin. This is vital because we can use this t adapt the training data t meet the user-specified errr tlerance. Building an emulatr using a GP is a standard technique in the statistics literature; see [5] fr an verview. In this sectin, we prvide backgrund n the basic apprach t building emulatrs. In Sectin 4, we extend t uncertain inputs and aim t quantify the uncertainty f utputs f UDFs. We then prpse an nline algrithm t cmpute UDFs and varius ptimizatins t address accuracy and perfrmance requirements in Sectin Intuitin fr GPs We give a quick intrductin t the use f GPs as emulatrs, clsely fllwing the textbk [8]. A GP is a distributin ver functins; whenever we sample frm a GP, we get an entire functin fr f whse utput is the real line. Fig. (a) illustrates this in ne dimensin. It shws three samples frm a GP, where each is a functin R R. Specifically, if we pick any input x, then f(x) is a scalar randm variable. This lets us get cnfidence estimates, because nce we have a scalar randm variable, we can get a cnfidence interval in the standard way, e.g., mean ± 2standard deviatin. T use this idea fr regressin, ntice that since f is randm, we can als define cnditinal distributins ver f, in particular, cnditinal distributin f f given a set f training pints. This new distributin ver functins is called the psterir distributin, and it is this distributin that lets us predict new values. 3.2 Definitin f GPs Just as the multivariate Gaussian is an analytically tractable distributin ver vectrs, the Gaussian prcess is an analytically tractable distributin ver functins. Just as a multivariate Gaussian is defined by a mean and cvariance matrix, a GP is defined by a mean functin and a cvariance functin. The mean functin m(x) gives the average value E[f(x)] fr all inputs x, where the expectatin is taken ver the randm functin f. The cvariance functin k(x, x ) 47

5 utput, f(x) input, x input, x (a) Prir (b) Psterir Figure : Example f GP regressin. (a) prir functins, (b) psterir functins cnditining n training data returns the cvariance between the functin values at tw input pints, i.e., k(x, x ) = Cv(f(x), f(x )). A GP is a distributin ver functins with a special prperty: if we fix any vectr f inputs (x,..., x n), the utput vectr f = (f(x ), f(x 2),..., f(x n)) has a multivariate Gaussian distributin. Specifically, f N (m, K), where m is the vectr (m(x )... m(x n)) cntaining the mean functin evaluated at all the inputs and K is a matrix f cvariances K ij = k(x i, x j) between all the input pairs. The cvariance functin has a vital rle. Recall that the idea was t apprximate f by interplating between its values at nearby pints. The cvariance functin helps determine which pints are nearby. If tw pints are far away, then their functin values shuld be nly weakly related, i.e., their cvariance shuld be near. On the ther hand, if tw pints are nearby, then their cvariance shuld be large in magnitude. We accmplish this by using a cvariance functin that depends n the distance between the input pints. In this wrk, we use standard chices fr the mean and cvariance functins. We chse the mean functin m(x) =, which is a standard chice when we have n prir infrmatin abut the UDF. Fr the cvariance functin, we use the squared expnential ne, which in its simplest frm is k(x, x ) = σf 2 e 2l 2 x x 2, where is Euclidean distance, and σf 2 and l are its parameters. The signal variance σf 2 primarily determines the variance f the functin value at individual pints, i.e., x = x. Mre imprtant is the lengthscale l, which determines hw rapidly the cvariance decays as x and x mve farther apart. If l is small, the cvariance decays rapidly, s sample functins frm the result GP will have many small bumps; if l is large, then these functins will tend t be smther. The key assumptin made by GP mdeling is that at any pint x, the functin value f(x) can be accurately predicted using the functin values at nearby pints. GPs are flexible t mdel different types f functins by using an apprpriate cvariance functin [8]. Fr instance, fr smth functins, squared-expnential cvariance functins wrk well; fr less smth functins, Matern cvariance functins wrk well (where smthness is defined by mean-squared differentiability ). In this paper, we fcus n the cmmn squaredexpnential functins, which are shwn experimentally t wrk well fr the UDFs in ur applicatins (see 6.4). In general, the user can chse a suitable cvariance functin based n the well-defined prperties f UDFs, and plug it int ur framewrk. 3.3 Inference fr New Input Pints We next describe hw t use a GP t predict the functin utputs at new inputs. Dente the training data by X = {x i i =,..., n} fr the inputs and f = {f i i =,..., n} fr the functin values. In this sectin, we assume that we are tld a fixed set f m test inputs X = (x, x 2,..., x m) at which we wish t predict the functin values. Dente the unknwn functin values at the test pints by f = (f, f 2,..., f m). The vectr (f, f) is a randm vectr because each f i:i=...m is randm, and by the definitin f a GP, this vectr simply has a multivariate Gaussian distributin. This distributin is: utput, f(x) [ ] ( [ ] ) f K(X, X ) K(X, X) N, f K(X, X, () ) K(X, X) where we have written the cvariances as matrix with fur blcks. The blck K(X, X) is an n m matrix f the cvariances between all training and test pints, i.e., K(X, X) ij = k(x i, x j). Similar ntins are fr K(X, X ), K(X, X), and K(X, X ). Nw that we have a jint distributin, we can predict the unknwn test utputs f by cmputing the cnditinal distributin f f given the training data and test inputs. Applying the standard frmula fr the cnditinal f a multivariate Gaussian yields: f X, X, f N (m, Σ), where (2) m = K(X, X )K(X, X ) f Σ = K(X, X) K(X, X )K(X, X ) K(X, X) T interpret m intuitively, imagine that m =, i.e., we wish t predict nly ne utput. Then K(X, X )K(X, X ) is an n- dimensinal vectr, and the mean m(x) is the dt prduct f this vectr with the training values f. S m(x) is simply a weighted average f the functin values at the training pints. A similar intuitin hlds when there is mre than ne test pint, m >. Fig. (b) illustrates the resulting GP after cnditining n training data. As bserved, the psterir functins pass thrugh the training pints marked by the black dts. The sampled functins als shw that the further a pint is frm the training pints, the larger the variance is. We nw cnsider the cmplexity f this inference step. Nte that nce the training data is cllected, the inverse cvariance matrix K(X, X ) can be cmputed nce, with a cst f O(n 3 ). Then given a test pint x (r X has size ), inference invlves cmputing K(X, X ) and multiplying matrices, which has a cst f O(n 2 ). The space cmplexity is als O(n 2 ), fr string these matrices. 3.4 Learning the Hyperparameters Typically, the cvariance functins have sme free parameters, called hyperparameters, such as the lengthscale l f the squaredexpnential functin. The hyperparameters determine hw quickly the cnfidence estimates expand as test pints mve further frm the training data. Fr example, in Fig. (b), if the lengthscale decreases, the spread f the functin will increase, meaning that there is less cnfidence in the predictins. We can learn the hyperparameters using the training data (see Chapter 5, [8]). We adpt maximum likelihd estimatin (MLE), a standard technique fr this prblem. Let θ be the vectr f hyperparameters. The lg likelihd functin is L(θ) := lg p(f X, θ) = lg N (X ; m, Σ); here we use N t refer t the density f the Gaussian distributin, and m and Σ are defined in Eq. (2). MLE slves fr the value f θ that maximizes L(θ). We use gradient descent, a standard methd fr this task. Its cmplexity is O(n 3 ) due t the cst f inverting the matrix K(X, X ). Gradient descent requires many steps t cmpute the ptimal θ; thus, retraining ften has a high cst fr large numbers f training pints. Nte that when the training data X changes, θ that maximizes the lg likelihd L(θ) may als change. Thus, ne wuld need t maximize the lg likelihd t update the hyperparameters. In 5.3, we will discuss retraining strategies that aim t reduce this cmputatin cst. 4. UNCERTAINTY IN QUERY RESULTS S far in ur discussins f GPs, we have assumed that all the input values are knwn in advance. Hwever, ur wrk aims t cmpute UDFs n uncertain input. In this sectin, we describe hw 472

6 f, ˆf, f true functin, mean functin f the GP, and a sample functin f the GP, respectively. f L, f S upper and lwer envelpe functins f f (with high prbability) Y, Ŷ, Ỹ utput crrespnding t f, ˆf, f, respectively. Y L, Y S utput crrespnding t f L, f S, respectively. Ŷ estimate f Ŷ using MC sampling. (Similarly fr Y L and Y S) ρ, ˆρ prbability f Ỹ and Ŷ, in a given interval [a, b]. ρ U, ρ L upper and lwer bunds f ρ (with high prb.). ρ, ˆρ, ρ U, ρ L MC estimates f ρ, ˆρ, ρ U and ρ L respectively. n number f training pints. m number f MC samples. Table : The main ntatin used in GP techniques. GP (distributins f functins) y mean functin X (b) sample functin (a) ^f(x) upper ~ f(x) lwer ^ f(x)+zασ(x) (YL) ^ ^ f(x) (Y) ^ f(x)-zασ(x) (YS) x ^ f(x)+zασ(x) ^ f(x)-zασ(x) Pr (Y y) Uncertain X YS Y^ YL YGP YS a MC Sampling ^ Y b (c) Figure 2: GP inference fr uncertain input. (a) Cmputatin steps (b) Apprximate functin with bunding envelpe (c) Cmputing prbability fr interval [a, b] frm CDFs t cmpute utput distributins using a GP emulatr given uncertain input. We then derive theretical results t bund the errrs f the utput using ur accuracy metrics. 4. Cmputing the Output Distributin We first describe hw t apprximate the UDF utput Y = f(x) given uncertain input X. When we apprximate f by the GP emulatr ˆf, we have a new apprximate utput Ŷ = ˆf(X), having CDF, Pr[Ŷ y] = ( ˆf(x) y)p(x)dx. This integral cannt be cmputed analytically. Instead, a simple, ffline algrithm is t use Mnte Carl integratin by repeatedly sampling input values frm p(x). This is very similar t Algrithm, except that we call the emulatr ˆf rather than the UDF f, which is a cheaper peratin fr lng-running UDFs. The algrithm is detailed as belw. Algrithm 2 Offline algrithm using Gaussian prcesses : Cllect n training data pints, {(x i, yi ), i =..n} by evaluating yi = f(x ) 2: Learning a GP via training using the n training data pints, t get GP ( ˆf( ), k(, )). 3: Fr uncertain input, X p(x): 4: Draw m samples, x,..., x m, frm the distributin p(x). 5: Predict functin values at the samples via GP inference t get {( ˆf(x i), σ 2 (x i)), i =..m} 6: Cnstruct the empirical CDF f Ŷ frm the samples, namely Ŷ, Pr(Ŷ y) = m i [..m] [ ˆf i, ) (y), and return Ŷ. In additin t returning the CDF f Ŷ, we als want t return a cnfidence f hw clse Ŷ is t the true answer Y. Ideally, we wuld d this by returning the discrepancy metric, D(Ŷ, Y ). But it is difficult t evaluate D(Ŷ, Y ) withut many calls t the UDF YL ^ Y' Y'L Y'GP Y'S y f, which wuld defeat the purpse f using emulatrs. S instead we ask a different questin, which is feasible t analyze. The GP defines a psterir distributin ver functins, and we are using the psterir mean as the best emulatr. The questin we ask is hw different wuld the query utput be if we emulated the UDF using a randm functin frm the GP, rather than the psterir mean? If this difference is small, this means the GP s psterir distributin is very cncentrated. In ther wrds, the uncertainty in the GP mdeling is small, and we d nt need mre training data. T make this precise, let f be a sample frm the GP psterir distributin ver functins, and define Ỹ = f(x) (see Fig. 2a fr an illustratin fr these variables). That is, Ỹ represents the query uput if we select the emulatr randmly frm the GP psterir distributin. The cnfidence estimate that we will return will be an upper bund n D(Ŷ, Ỹ ). 4.2 Errr Bunds Using Discrepancy Measure We nw derive a bund n the discrepancy D(Ŷ, Ỹ ). An imprtant pint t nte is that there are tw surces f errr here. The first is the errr due t Mnte Carl sampling f the input and the secnd is the errr due t the GP mdeling. In the analysis that fllws, we bund each surce f errr individually and then cmbine them t get a single errr bund. T the best f ur knwledge, this is the first wrk t quantify the utput distributins f GPs. The main idea is that we will cmpute a high prbability envelpe ver the GP predictin. That is, we will find tw functins f L and f S such that f S f f L with prbability at least ( α), fr a given α. Once we have this envelpe n f, then we als have a high prbability envelpe f Ỹ, and can use this t bund the discrepancy. Fig. 2 (parts b & c) gives an illustratin f this intuitin. Bunding Errr fr One Interval. T start, assume that we have already cmputed a high prbability envelpe. Since the discrepancy invlves a supremum ver intervals, we start by presenting upper and lwer bunds n ρ := Pr[Ỹ [a, b] f] fr a single fixed interval [a, b]. Nw, ρ is randm because f is; fr every different functin f we get frm the GP psterir, we get a different ρ. Fr any envelpe (f S, f L), e.g., having the frm ˆf(x)±zσ(x) as shwn in Fig. 2, define Y S = f S(X) and Y L = f L(X). We bund ρ (with high prbability) using Y S and Y L. Fr any tw functins g and h, and any randm vectr X, it is always true that g h implies that Pr[g(X) a] Pr[h(X) a] fr all a. Putting this tgether with f S f f L, we have that ρ = Pr[ f(x) b] Pr[ f(x) a] Pr[f S (X) b] Pr[f L (X) a] In ther wrds, this gives the upper bund: ρ ρ U := Pr[Y S b] Pr[Y L a] (3) Similarly, we can derive the lwer bund: ρ ρ L := max(, Pr[Y L b] Pr[Y S a]) (4) This is summarized in the fllwing result. Prpsitin 4. Suppse that f S and f L are tw functins such that f S f f L with prbability ( α). Then ρ L ρ ρ U, with prbability ( α), where ρ U and ρ L are as in Eqs. 3 and 4. Bunding λ-discrepancy. Nw that we have the errr bund fr ne individual interval, we use this t bund the λ-discrepancy D λ (Ỹ, Ŷ ). Using the bunds f ρ, we can write this discrepancy as D λ (Ỹ, Ŷ ) = sup ρ ˆρ sup max{ ρ L ˆρ, ρ U ˆρ }, [a,b] [a,b] where the inequality applies the result frm Prpsitin 4.. This is prgress, but we cannt cmpute ρ L, ρ U, r ˆρ exactly because they 473

7 Algrithm 3 Cmpute λ-discrepancy errr bund : Cnstruct the empirical CDFs, Ŷ, Y S and Y L, frm the utput samples. Let V be the set f values f these variables. 2: Precmpute max b b (Pr[Ŷ b] Pr[Y L b]) and max b b (Pr[Y S b] Pr[Ŷ b]) b V. 3: Cnsider values fr a, s.t. [a, a + λ] lies in the supprt f Ŷ. a is in V, enumerated frm small t large. 4: Fr a given a: (a) Get Pr[Ŷ a], Pr[Y S a], and Pr[Y L a]. (b) Get max b a+λ (Pr[Y S b] Pr[Ŷ b]). Find smallest b s.t. Pr[Y L b ] Pr[Y S a], and then get max b b (Pr[Ŷ b] Pr[Y L b]). This is dne by using the precmputed values in Step 2. (c) Cmpute max(ρ U ˆρ, ˆρ ρ L) frm the quantities in (a) and (b). This is the errr bund fr intervals starting with a. 5: Increase a, repeat step 4, and update the maximum errr. 6: Return the maximum errr fr all a, which is ɛ GP. require integrating ver the input X. S we will use Mnte Carl integratin nce again. We cmpute Y L and Y S, as MC estimates f Y L and Y S respectively, frm the samples in Algrithm 2. We als define (but d nt cmpute) Ỹ, the randm variable resulting frm MC apprximatin f Ỹ with the same samples. An identical argument t that f Prpsitin 4. shws that D λ (Ỹ, Ŷ ) = sup ρ ˆρ sup max{ ρ L ˆρ, ρ U ˆρ } := ɛ GP, [a,b] [a,b] where adding a prime means t use Mnte Carl estimates. Nw we present an algrithm t cmpute ɛ GP. The easiest way wuld be t simply enumerate all pssible intervals. Because Ŷ, Y S, and Y L are empirical CDFs ver m samples, there are O(m 2 ) pssible values fr ρ U, ρ L, and ˆρ. This can be inefficient fr large numbers f samples m, as we bserved empirically. Instead, we present a mre efficient algrithm t cmpute this errr bund, as shwn in Algrithm 3. The main idea is t (i) precmpute the maximum differences between the mean functin and each envelpe functin cnsidering decreasing values f b (Step 2), then (ii) enumerate the values f a increasingly and use the precmputed values t bund ρ fr intervals starting with a (Steps 3-5). This invlves taking a pass thrugh the m pints in the empirical CDF f Ŷ. Then fr a given value f a, use binary search t find the smallest b s.t. Pr[Y L b ] Pr[Y S a]. The cmplexity f this algrithm is O(m lg m). Mre details are available in [24]. Cmbining Effects f Tw Surces f Errr. What we return t the users is the distributin f Ŷ, frm which ˆρ can be cmputed fr any interval. As nted, there are tw surces f errr in ˆρ : the GP mdeling errr and the MC sampling errr. The latter arises frm having Ŷ, Y L, and Y S t apprximate Ŷ, YL, and YS respectively. The GP errr is frm using the mean functin t estimate ρ. We can cmbine these int a single errr bund n the discrepancy: D λ (Ŷ, Ỹ ) D λ(ŷ, Ỹ ) + D λ (Ỹ, Ỹ ). This fllws frm the triangle inequality that D λ satisfies because it is a metric. Abve we just shwed that D λ (Ŷ, Ỹ ) ɛ GP. Furthermre, D λ (Ỹ, Ỹ ) is just the errr due t a standard Mnte Carl apprximatin, which, as discussed in 2, can be bunded with high prbability by, say, ɛ MC, depending n the number f samples. Als, the tw surces f errr are independent. This yields the main errr bund f this paper, which we state as fllws. Therem 4. If MC sampling is (ɛ MC, δ MC)-apprximate and GP predictin is (ɛ GP, δ GP )-apprximate, then the utput has an errr bund f (ɛ MC + ɛ GP ) with prbability ( δ MC)( δ GP ). Cmputing Simultaneus Cnfidence Bands. Nw we describe hw t chse a high prbability envelpe, i.e., a pair (f S, f L) that cntains f with prbability α. We will use a band f the frm f S = ˆf(x) z ασ(x) and f L = ˆf(x) + z ασ(x). The prblem is t chse z α. An intuitive chice wuld be t chse z α based n the quantiles f the univariate Gaussian, e.g., chse z α = 2 fr a 95% cnfidence band. This wuld give us a pint-wise cnfidence band, i.e., at any pint x, we wuld have f S(x) f(x) f L(x). But we need smething strnger. Rather, we want (f S, f L) such that the prbability that f S(x) f(x) f L(x) at all inputs x simultaneusly is at least α. An envelpe with this prperty is called a simultaneus cnfidence band. We will still use a band f the frm ˆf(x) ± z ασ(x), but we will need t chse a z α large enugh t get a simultaneus cnfidence band. Say we set z α t sme value z. The cnfidence f(x) ˆf(x) band is satisfied if Z(x) := z fr any x. Therefre, σ(x) if the prbability f sup x X Z(x) z is small, the cnfidence band is unlikely t be vilated. We adpt an apprximatin f this prbability due t [3], i.e., Pr[sup Z(x) z] E[ϕ(A z(x)], (5) x X where the set A z(x) := {x X : Z(x) z} is the set f all inputs where the cnfidence band is vilated, and ϕ(a) is the Euler characteristic f the set A. Als, [3] prvides a numerical methd t apprximate Eq. (5) that wrks well fr small α, i.e., high prbability that the cnfidence band is crrect, which is precisely the case f interest. The details are smewhat technical, and are mitted fr space; see [3, 24]. Overall, the main cmputatinal expense is that the apprximatin requires cmputing secnd derivatives f the cvariance functin, but we have still fund it t be feasible in practice. Once we cmputed the apprximatin t Eq. (5), we cmpute the cnfidence band by setting z α t be the slutin f the equatin Pr[sup x X Z(x) z α] E[ϕ(A z(x)] = α. 4.3 Errr Bunds fr KS Measure The abve analysis can be applied if the KS distance is used as the accuracy metric in a similar way. The main result is as fllws. Prpsitin 4.2 Cnsider the mean functin ˆf(x) and the envelpe ˆf(x) ± zσ(x). Let f(x) be a functin in the envelpe. Given uncertain input X, let Ŷ = ˆf(X) and Ỹ = f(x). Then KS(Ỹ, Ŷ ) is largest when f(x) is at either the bundary f the envelpe. Prf sketch. Recall that KS(Ỹ, Ŷ ) = sup y Pr[Ỹ y] Pr[Ŷ y]. Let ym crrespnd t the supremum in the frmula f KS. Wlg, let KS = ([ ˆf(x) y m] [ f(x) y m])p(x)dx >. That is, fr sme x, ˆf(x) y m < f(x). Nw suppse there exists sme x s.t. f(x ) < ˆf(x ), the KS distance wuld increase if ˆf(x ) f(x ). This means, KS becmes larger when f(x) ˆf(x) fr all x; r, f(x) lies abve ˆf(x) fr all x. Als, it is intuitive t see that amng the functins that lie abve ˆf(x), ˆf(x) + zσ(x) yields the largest KS errr, since it maximizes [ ˆf(x) y] [ f(x) y], y. (Similarly, we can shw that if KS = ([ f(x) y m] [ ˆf(x) y m])p(x)dx >, KS is maximized if f(x) lies belw ˆf(x) fr all x.) 474

8 : input sample : training pint xfar xnear input sample bunding bx lcal training pint bunding bx training pint Figure 3: Chsing a subset f training pints fr lcal inference As a result, let Y S and Y L be the utput cmputed using the upper and lwer bundaries ˆf(x) ± zσ(x) respectively. Then, the KS errr bund is max(ks(ŷ, YS), KS(Ŷ, YL)) We can btain the empirical variables Ŷ, Y S, and Y L via Mnte Carl sampling as befre. We als analyze the cmbining effects f the tw surces f errr, MC sampling and GP mdeling, as fr the discrepancy measure. We btain a similar result: the ttal errr bund is the sum f the tw errr bunds, ɛ MC and ɛ GP. The prf is mitted due t space cnstraints but available in [24]. 5. AN OPTIMIZED ONLINE ALGORITHM In Sectin 4., we present a basic algrithm (Algrithm 2) t cmpute utput distributins when Gaussian prcesses mdel ur UDFs. Hwever, this algrithm des nt satisfy ur design cnstraints as fllws. This is an ffline algrithm since the training data is fixed and learning is perfrmed befre inference. Given an accuracy requirement, it is hard t knw the number f training pints, n, needed befrehand. If we use larger n, the accuracy is higher, but the perfrmance suffers due t bth the training cst O(n 3 ) and the inference cst O(n 2 ). We nw seek an nline algrithm that is rbust t UDFs and input distributins in meeting accuracy requirements. We further ptimize it fr high perfrmance. 5. Lcal Inference We first prpse a technique t reduce the cst f inference while maintaining gd accuracy. The key bservatin is that the cvariance between tw pints x i and x j is small when the distance between them is large. Fr example, the squared-expnential cvariance functin decreases expnentially in the squared distance, k(x i, x j) = σf 2 exp{ x i x j 2 }. Therefre, the far training l pints have nly small weights 2 in the weighted average, and hence can be mitted. This suggests a technique that we call lcal inference with the steps shwn in Algrithm 4. (We refer t the standard inference technique as glbal inference.) Algrithm 4 Lcal inference Input: Input distributin p(x). Training data: {(x i, y i ), i =... n}, stred in an R-tree. : Draw m samples frm the input distributin p(x) and cnstruct a bunding bx fr the samples. 2: Retrieve a set f training pints, called X L, that have distance t the bunding bx less than a maximum distance specified by the lcal inference threshld Γ (discussed mre belw). 3: Run inference using X L t get the functin values at the samples. Return the CDF cnstructed frm the inferred values. Fig. 3 illustrates the executin f lcal inference t select a subset f training pint given the input distributin. The darker rectangle is the bunding bx f the input samples, and the lighter rectangle includes the training pints selected fr lcal inference. Chsing the training pints fr lcal inference given a threshld. The threshld Γ is chsen s that the apprximatin errr in ˆf(x j), fr all samples x j, is small. That is, ˆf(x j) when cmputed using either glbal r lcal inference des nt differ much. Revisit glbal inference as in Eq. 2. The vectr K(X, X ) y, called α, can be updated nce the training data changes, and stred fr later inference. Then, cmputing ˆf(x j) = K(x j, X )K(X, X ) y = K(x j, X )α invlves a vectr dt prduct. Nte that the cst f cmputing this mean is O(n); the high cst f inference O(n 2 ) is due t cmputing the variance σ 2 (x j) (see 3.3 fr mre detail). If we use a subset f training pints, we apprximate ˆf(x j) with ˆf L(x j) = K(x j, X L)α L. (α L is the same as α except that the entries in α that d nt crrespnd t a selected training pint are set t ). Then the apprximate errr γ j, fr the sample j, is: γ j K(x j, X )α K(x j, X L)α L = K(x j, X L)α L = l L k(x j, x l )α l, where X L are the training pints excluded frm lcal inference. Ultimately, we want t cmpute γ = max j γ j, which is the maximum errr ver all the samples. The cst f cmputing γ by cnsidering every j is O(mn), as j =...m, which is high fr large m. We next present a mre efficient way t cmpute an upper bund fr γ. We use a bunding bx fr all the samples x j as cnstructed during lcal inference. Fr any training pint with index l, x l, let x near be the clsest pint frm the bunding bx t x l and x far be the furthest pint frm the bunding bx t x l (see Fig. 3 fr an example f these pints). Fr any sample j we have: k(x far, x l ) k(x j, x l ) k(x near, x l ) Next, by multiplying with α l, we have the upper and lwer bunds fr k(x j, x l )α l. With these inequalities, we can btain an upper bund γ upper and lwer bund γ lwer fr γ j, j. Then, γ = max γ j j max( γ upper, γ lwer ) Cmputing this takes time prprtinal t the number f excluded training pints, which is O(n). Fr each f these pints, we need t cnsider the sample bunding bx, which incurs a cnstant cst when the dimensin f the functin is fixed. After cmputing γ, we cmpare it with the threshld Γ. If γ > Γ, we expand the bunding bx fr selected training pints and recmpute γ until we have γ Γ. Nte that Γ shuld be set t be small cmpared with the dmain f Y, i.e., the errr incurred fr every test pint is small. In 6, we shw hw t set Γ t btain gd perfrmance. We mentin an implementatin detail t make the bund γ tighter, which can result in fewer selected training pints fr imprved perfrmance. We divide the sample bunding bx int smaller nn-verlapping bxes as shwn in Fig. 3. Then fr each bx, we cmpute its γ, and then return the maximum f all these bxes. Cmplexity fr lcal inference. Let l be the number f selected training pints; the cst f inference is O(l 3 +ml 2 +n). O(l 3 ) is t cmpute the inverse matrix K(X L, X L) needed in the frmula f variance; O(ml 2 ) is t cmpute the utput variance; and O(n) is t cmpute γ while chsing the lcal training pints. Amng the csts, O(ml 2 ) is usually dminant (esp. fr high accuracy requirement). This is an imprvement cmpared t glbal inference, which has a cst f O(mn 2 ), because l is usually smaller than n. 5.2 Online Tuning Our bjective is t seek an nline algrithm fr GPs: we start with n training pints and cllect them ver time s that the functin mdel gets mre accurate. We can examine each input distributin n-the-fly t see whether mre training pints are needed given 475

9 an accuracy requirement. This cntrasts with the ffline apprach where the training data must be btained befre inference. T develp an nline algrithm, we need t make tw decisins. The first decisin is hw many training pints t add. This is a task related t the errr bunds frm 4, that is, we add training pints until the upper bund n the errr is less than the user s tlerance level. The secnd decisin is where the training pints shuld be, specifically, what input lcatin x n+ t use fr the next training pint. A standard methd is t add new training pints where the functin evaluatin is highly uncertain, i.e., σ 2 (x) is large. We adpt a simple heuristic fr this: we cache the Mnte Carl samples thrughut the algrithm, and when we need mre training pints, we chse the sample x j that has the largest predicted variance σ 2 (x j), cmpute its true functin value f(x j), and add it t the training data set. After that, we run inference, cmpute the errr bund again, and repeat until the errr bund is small enugh. We have experimentally bserved that this simple heuristic wrks well. A cmplicatin is that when we add a new training pint, the inverse cvariance matrix gets bigger K(X, X ), s it needs t be recmputed. Recmputing it frm scratch wuld be expensive, i.e., O(n 3 ). Frtunately, we can update it incrementally using the standard frmula fr inverting a blck matrix (see [24] fr details). 5.3 Online Retraining In ur wrk, the training data is btained n the fly. Since different inputs crrespnd t different regins f the functin, we may need t tune the GP mdel t best fit the up-t-date training data, i.e., t retrain. A key questin is when we shuld perfrm retraining (as mentined in 3.4). It is preferable that retraining is dne infrequently due t its high cst f O(n 3 ) in the number f training pints and multiple iteratins required. The prblem f retraining is less cmmnly addressed in existing wrk fr GPs. Since retraining invlves maximizing the likelihd functin L(θ), we will make this decisin by examining the likelihd functin. Recall als that the numerical ptimizer, e.g., gradient descent, requires multiple iteratins t find the ptimum. A simple heuristic is t run training nly if the ptimizer is able t make a big step during its very first iteratin. Given the current hyperparameters θ, run the ptimizer fr ne step t get a new setting θ, and cntinue with training nly if θ θ is larger than a pre-set threshld θ. In practice, we have fund that gradient descent des nt wrk well with this heuristic, because it des nt mve far enugh during each iteratin. Instead, we use a mre sphisticated heuristic based n a numerical ptimizer, called Newtn s methd, which uses bth the first and the secnd derivatives f L(θ). Mathematical derivatin shws that secnd derivatives f L(θ) are: L (θ j ) = 2 tr[( K θ j y y T K +K y y T K θ j K ) K θ j θ j + (K y y T K K ) 2 K θj 2 ], where tr[ ] is the trace f a matrix. K/ θ j and 2 K/ θ 2 j can be updated incrementally. (The details are shwn in [24].) 5.4 A Cmplete Online Algrithm We nw put tgether all f the abve techniques t frm a cmplete nline algrithm t cmpute UDFs n uncertain data using GPs. The main idea is, starting with n training data, given an input distributin, we use nline tuning in 5.2 t btain mre training data, and run inference t cmpute the utput distributin. Lcal inference in 5. is used fr imprved perfrmance. When sme training pints are added, we use ur retraining strategy t decide whether t relearn the GP mdel by updating its hyperparameters. Algrithm 5 OLGAPRO: Cmpute utput distributin using Gaussian prcess with ptimizatins Input: Input tuple X p(x). Training data: T = {(x i, y i ), i =..n}; hyperparameters f the GP: θ. Accuracy requirement fr the discrepancy measure: (ɛ, δ). : Draw m samples fr X, {x j, j =..m}, where m depends n the sampling errr bund ɛ MC < ɛ. 2: Cmpute the bunding bx fr these samples. Retrieve a subset f training pints fr lcal inference given the threshld Γ (see 5.). Dente this set f training pint T Γ. 3: repeat 4: Run lcal inference using T Γ t get the utput samples {( ˆf(x j), σ 2 (x j)), j =..m}. 5: Cmpute the discrepancy errr bund D upper using these samples (see 4.2). 6: If D upper > ɛ GP, add a new training pint at the sample with largest variance, i.e., (x n+, f(x n+)) (see 5.2), and insert this pint int the training data index. Set n := n +. 7: until D upper ɛ GP 8: if ne r mre training pints are added then 9: Cmpute the lg likelihd L(θ) = lg p(y X, θ) and its first and secnd derivatives, and estimate δ θ (see 5.3). : if δ θ θ then : Retrain t get the new hyperparameters θ. Set θ := θ. 2: Rerun inference. 3: end if 4: end if 5: Return the distributin f Y, cmputed frm samples { ˆf(x j)}. Our algrithm, which we name OLGAPRO, standing fr ONline GAussian PROcess, is shwn as Algrithm 5. The bjective is t cmpute the utput distributin that meets the user-specified accuracy requirement under the assumptin f GP mdeling. The main steps f the algrithm invlve: (a) Cmpute the utput distributin by sampling the input and inferring with the Gaussian prcess (Steps -4). (b) Cmpute the errr bund (Steps 5-7). If this errr bund is larger than the allcated errr bund, use nline tuning t add a new training pint. Repeat this until the errr bund is acceptable. (c) If ne r mre training pints have been added, decide whether retraining is needed and if s perfrm retraining (Steps 8-2). Parameter setting. We further cnsider the parameters used in the algrithm. The chice f Γ fr lcal inference in step 2 is discussed in 5.). The allcatin f tw surces f errr, ɛ MC and ɛ GP is accrding t Therem 4., ɛ = ɛ MC + ɛ GP. Then ur algrithm autmatically chses the number f samples m t meet the accuracy requirement ɛ MC (see 2 fr the frmula). Fr retraining, setting the threshld θ, mentined in 5.3, smaller will trigger retraining mre ften but ptentially make the mdel mre accurate, while setting it high can give inaccurate results. In 6, we experimentally shw hw t set these parameters efficiently. Cmplexity. The cmplexity f lcal inference is O(l 3 + ml 2 + n) as shwn in 5.. Cmputing the errr bund takes O(m lg m) (see 4.2). And, retraining takes O(n 3 ). The number f samples m is O(/ɛ 2 MC), while the number f training pints n depends n ɛ GP and the UDF itself. The unit cst is basic math peratins, in cntrast t cmplex functin evaluatins as in standard MC simulatin. This is because when the system cnverges, we seldmly need t add mre training pints, r t call functin evaluatin. Als, at cnvergence, the high cst f retraining can be avided; the cmputatin needed is fr inference and cmputing errr bunds. 476

10 Hybrid slutin. We nw cnsider a hybrid slutin that cmbines ur tw appraches: direct MC sampling, and GP mdeling and inference. The need fr a hybrid slutin arises since functins can vary in their cmplexity and evaluatin time. Therefre, when given a black-bx UDF, we explre these prperties n the fly and chse the better slutin. We can measure the functin evaluatin time while btaining training data. We then run GPs t cnvergence, measure its inference time, and then cmpare the running times f the tw appraches. Due t space cnstraints, the details f this slutin are deferred t [24]. In 6, we cnduct experiments t determine the cases where each apprach can be applied. 5.5 Online Filtering In the presence f a selectin predicate n the UDF utput, similar t the filtering technique fr Mnte Carl simulatin ( 2), we als cnsider nline filtering when sampling with a Gaussian prcess. Again, we cnsider selectin with the predicate a f(x) b. Let ( ˆf(x), σ 2 (x)) be the estimate at any input pint x. With the GP apprximatin, the tuple existence prbability ρ is apprximated with ˆρ = Pr[ ˆf(x) [a, b]]. This is exactly the quantity that we bunded in 4.2, where we shwed that ρ ρ U. S in this case, we filter tuples whse estimate f ρ U is less than ur threshld. Again, since ρ U is cmputed frm the samples, we can check this nline fr filtering decisin as in PERFORMANCE EVALUATION In this sectin, we evaluate the perfrmance f ur prpsed techniques using bth synthetic functins and data with cntrlled prperties, and real wrklads frm the astrphysics dmain. 6. Experimental Setup We first use synthetic functins with cntrlled prperties t test the perfrmance and sensitivity f ur algrithms. We nw describe the settings f these functins, input data and parameters used. A. Functins. We generate functins (UDFs) f different shapes in terms f bumpiness and spikiness. A simple methd is t use Gaussian mixtures [] t simulate varius functin shapes (which shuld nt be cnfused with the input and utput distributins f the UDF and by n means favrs ur GP apprach). We vary the number f Gaussian cmpnents, which dictates the number f peaks f a functin. The means f the cmpnents determine the dmain, and their cvariance matrix determines the stretch and bumpiness f the functin. We dente the functin dimensinality d; this is the number f input variables f the functin. We bserve that in real applicatins, many functins have lw dimensinality, e.g., r 2 fr astrphysics functins. Fr evaluatin purpses, we vary d in a wider range f [,]. Besides the shape, a functin is characterized by the evaluatin time, T, which we vary in the range µs t s. B. Input Data. By default, we cnsider uncertain data fllwing Gaussian distributins, i.e., the input vectr has distributin characterized by N (µ I, Σ I). µ I is drawn frm the given supprt f the functin [L, U]. Σ I determines the spread f the input distributins. Fr simplicity, we assume the input variables f a functin are independent, but supprting crrelated input is nt harder we just need t sample frm the jint distributins. We als cnsider ther distributins including expnential and Gamma. We nte that handling ther types f distributins is similar due t the same reasn (the difference is the cst f sampling). C. Accuracy Requirement. We use the discrepancy measure as the accuracy metric in ur experiments. The user specifies the accuracy requirement (ɛ, δ) and the minimum interval length λ. λ is set t be a small percentage (e.g., %) f the range f the functin. This requirement means that with prbability ( δ), fr any interval f length at least λ, the prbabilities f an interval cmputed frm the apprximate and true utput distributins d nt differ frm each ther by mre than ɛ. Fr the GP apprach, the errr bund ɛ is allcated t tw surces f errr, GP errr bund ɛ GP and sampling errr bund ɛ MC, where ɛ = ɛ GP + ɛ MC. We als distribute δ s that δ = ( δ GP )( δ MC). Our default setting is as fllws. The dmain f functin [L, U] = [, ], input standard deviatin σ I =.5, functin evaluatin time T = ms, accuracy requirement (ɛ =., δ =.5). The reprted results are averaged frm 5 utput distributins r when the algrithm cnverges, whichever is larger. 6.2 Evaluating ur GP Techniques We first evaluate the individual techniques emplyed in ur Gaussian prcess algrithm, OLGAPRO. The bjective is t understand and set varius internal parameters f ur algrithm. Prfile : Accuracy f functin fitting. We first chse fur twdimensinal functins f different shapes and bumpiness (see Fig. 4). These functins are the fur cmbinatins between (i) ne r five cmpnents, (ii) large r small variance f Gaussian cmpnents, which we refer t as F, F 2, F 3, and F 4. First, we check the effectiveness f GP mdeling. We vary the number f training pints n and run basic glbal inference at test pints. Fig. 5(a) shws the relative errrs fr inference, i.e., ˆf(x) f(x), evaluated at a large f(x) number f test pints. The simplest functin F with ne peak and being flat needs a small number f training pints, e.g., 3, t be well apprximated. In cntrast, the mst bumpy and spiky functin F 4 requires the largest number f pints, n > 3, t be accurate. The ther tw functins are in between. This cnfirms that the GP apprach can mdel functins f different shapes well, hwever the number f training pints needed varies with the functin. In the later experiments, we will shw that OLGAPRO can rbustly determine the number f training pints needed nline. Prfile 2: Behavir f errr bund. We next test the behavir f ur discrepancy errr bund, which is described in 4.2 and cmputed using Algrithm 3. We cmpute the errr bunds and measure the actual errrs. Fig. 5(b) shws the result fr the functin F 4, which cnfirms that the errr bunds are actual upper bunds and hence indicates the validity f GP mdeling. Mre interestingly, it shws hw tight the bunds are (abut 2 t 4 times f the actual errrs). As λ gets smaller, mre intervals are cnsidered fr the discrepancy measure; thus, the errrs and errr bunds, the suprema fr a larger set f intervals, get larger. We test the ther functins and bserve the same trends. In the fllwing experiments, we use a stringent requirement: setting λ t be % f the functin range. Prfile 3: Allcatin f tw surces f errr. We als examine the allcatin f the user-specified errr bund ɛ t the errrs frm GP mdeling and MC sampling, ɛ GP and ɛ MC, as in Therem 4.. The details are mitted due t space cnstraints, but are discussed in [24]. In general, we set ɛ MC t be.7ɛ fr gd perfrmance. In the next three experiments, we evaluate three key techniques emplyed in ur GP apprach. The default functin is F 4. Expt : Lcal inference. We first cnsider ur lcal inference technique as shwn in 5.. We cmpare the accuracy and running time f lcal inference with thse f glbal inference. Fr nw, we fix the number f training pints t cmpare the perfrmance f the tw inference techniques. We vary the threshld Γ f lcal inference frm.% t 2% f the functin range. Recall that setting Γ small crrespnds t using mre training pints and hence similar t glbal inference. Our gal is t chse a setting f Γ s that lcal inference has similar accuracy as glbal inference while being faster. Figs. 5(c) and 5(d) shw the accuracy and running time, 477

11 (a) Funct (b) Funct2 (c) Funct3 (d) Funct4 Figure 4: A family f functins f different smthness and shape used in evaluatin. respectively. We see that fr mst f values Γ tested, lcal inference is as accurate as glbal inference while ffering a speedup frm 2 t 4 times. We repeat this experiment fr ther functins and bserve that fr less bumpy functins, the speedup fr lcal inference is less prnunced, but the accuracy is always cmparable. This is because fr smth functins, far training pints still have a high weight in inference. In general, we set Γ abut (.5x functin range), which results in gd accuracy and imprved running time. Expt 2: Online tuning. In 5.2, we prpsed adding training pints n-the-fly t meet the accuracy requirement. We nw evaluate ur heuristics f chsing samples with the largest variance t add. We cmpare it with tw fllwing heuristics: Given an input distributin, a simple ne is t chse a sample f the input at randm. Anther heuristics is what we call ptimal greedy, which cnsiders all samples, simulates adding each f them t cmpute a new errr bund, and then picks the sample having the mst errr bund reductin. This is nly hypthetical since it is prhibitively expensive t simulate adding every sample. Fr nly this experiment, we assume that each input has 4 samples fr ptimal greedy t be feasible. We start with just 25 training pints and add mre when necessary. Fig. 5(e) shws the accumulated number f training pints added ver time (fr perfrmance, we restrict that n mre than pints can be added fr every input). As bserved, ur technique using the largest variance requires fewer training pints, hence runs faster, than randmly adding pints. Als, it is clse t ur ptimal greedy while being much faster t be run nline. Expt 3: Retraining strategy. We nw examine the perfrmance f ur retraining strategy (see 5.3). We vary ur threshld fr retraining and cmpare this strategy with tw ther strategies: eager training when ne r mre training pints are added, and n training. Again, we start with a small number f training pints and add mre using nline tuning. Figs. 5(f) and 5(g) shw the accuracy and running time respectively. As expected, setting smaller means retraining mre ften and is similar t eager retraining, while larger means less retraining. We see that setting less than.5 gives best perfrmance, as fewer retraining calls are needed while the hyperparameters are still gd estimates. We repeat this experiment with ther functins and see that cnservatively setting =.5 gives gd perfrmance fr this set f functins. In practice, can be chsen in reference with the hyperparameter values. 6.3 GP versus Mnte Carl Simulatin We next examine the perfrmance f ur cmplete nline algrithm, OLGAPRO (Algrithm 5). The internal parameters are set as abve. We als cmpare this algrithm with the MC apprach. Expt 4: Varying user-specified ɛ. We run the GP algrithm fr all fur functins F t F 4. We vary ɛ in the range f [.2,.2]. Fig. 5(h) shws the running time fr the fur functins. (We verify that the accuracy requirement ɛ is always satisfied, and mit the plt due t space cnstraints.) As ɛ gets smaller, the running time increases. This is due t the fact that the number f samples is prprtinal t /ɛ 2 MC. Besides, small ɛ GP requires mre training pints, hence higher cst fr inference. This experiment als verifies the effect f the functin cmplexity n the perfrmance. A flat functin like F needs much fewer training pints than a bumpy, spiky functin like F 4, thus running time is abut tw rders f magnitude different. We als repeat this experiment fr ther input distributins including Gamma and expnential distributins, and bserve very similar results, which is due t ur general apprach f wrking with input samples. Overall, ur algrithm can rbustly adapt t the functin cmplexity and the accuracy requirement. Expt 5: Varying evaluatin time T. The tradeff between the GP and MC appraches mainly lies in the functin evaluatin time T. In this experiment, we fix ɛ =. and vary T frm µs t s. Fig. 5(i) shws the running time f the tw appraches fr all fur functins. Nte that the running time fr MC sampling is similar fr all functins, hence we just shw ne line. As bserved, the GP apprach starts t utperfrm the sampling apprach when functin evaluatin takes lnger than.ms fr simple functins like F, and up t ms fr cmplex functins like F4. Als we nte that ur GP apprach is almst insensitive t functin evaluatin time, which is nly incurred during the early phase. After cnvergence, functin evaluatin ccurs nly infrequently. This demnstrates the applicability f the GP apprach fr lng running functins. This result als argues fr the use f a hybrid slutin as described in 5.4. Since the functin cmplexity is unknwn befrehand, s is the number f training pints. The hybrid slutin can be perfrmed t autmatically pick the better apprach based n the functin s cmplexity and evaluatin time, e.g., the GP methd is used fr simple functins with evaluatin time f.ms r abve, and fr nly cmplex functins with lnger time. Expt 6: Optimizatin fr selectin predicates. We examine the perfrmance f nline filtering when there is a selectin predicate. As shwn in 2 and 5.5, this can be used fr bth direct MC sampling and sampling with a GP. We vary the selectin predicate, which in turn affects the rate that the utput is filtered. We decide t filter utput whse tuple existence prbability is less than.. Fig. 5(j) shws the running time. As seen, when the filtering rate is high, nline filtering helps reduce the running time, by a factr f 5 and 3 times fr MC and GP respectively. We bserve that the GP apprach has a higher speedup because besides prcessing fewer samples, it results in a GP mdel with fewer training pints, r smaller inference cst. Fig. 5(k) shws the false psitive rates, i.e., tuples shuld be filtered but are nt during the sampling prcess. We bserve that this rate is lw, always less than %. The false negative rates are zer r negligible (less than.5%). Expt 7: Varying functin dimensinality d. We cnsider different functins with dimensin d varying frm t. Fig. 5(l) shws the running time f these functins fr bth appraches. Since the running time using GP is insensitive t functin evaluatin time, we shw nly ne line fr T = s fr clarity. We bserve that with GPs, high-dimensinal functins incur high cst, because mre training pints are needed t capture a larger regin. Even with a high dimensin f, the GP apprach still utperfrms MC when the functin evaluatin time reaches.s. 478

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

A Scalable Recurrent Neural Network Framework for Model-free

A Scalable Recurrent Neural Network Framework for Model-free A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

Pure adaptive search for finite global optimization*

Pure adaptive search for finite global optimization* Mathematical Prgramming 69 (1995) 443-448 Pure adaptive search fr finite glbal ptimizatin* Z.B. Zabinskya.*, G.R. Wd b, M.A. Steel c, W.P. Baritmpa c a Industrial Engineering Prgram, FU-20. University

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

The steps of the engineering design process are to:

The steps of the engineering design process are to: The engineering design prcess is a series f steps that engineers fllw t cme up with a slutin t a prblem. Many times the slutin invlves designing a prduct (like a machine r cmputer cde) that meets certain

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

Subject description processes

Subject description processes Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses

More information

Comprehensive Exam Guidelines Department of Chemical and Biomolecular Engineering, Ohio University

Comprehensive Exam Guidelines Department of Chemical and Biomolecular Engineering, Ohio University Cmprehensive Exam Guidelines Department f Chemical and Bimlecular Engineering, Ohi University Purpse In the Cmprehensive Exam, the student prepares an ral and a written research prpsal. The Cmprehensive

More information

Broadcast Program Generation for Unordered Queries with Data Replication

Broadcast Program Generation for Unordered Queries with Data Replication Bradcast Prgram Generatin fr Unrdered Queries with Data Replicatin Jiun-Lng Huang and Ming-Syan Chen Department f Electrical Engineering Natinal Taiwan University Taipei, Taiwan, ROC E-mail: jlhuang@arbr.ee.ntu.edu.tw,

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Physics 2010 Motion with Constant Acceleration Experiment 1

Physics 2010 Motion with Constant Acceleration Experiment 1 . Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin

More information

BASD HIGH SCHOOL FORMAL LAB REPORT

BASD HIGH SCHOOL FORMAL LAB REPORT BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

8 th Grade Math: Pre-Algebra

8 th Grade Math: Pre-Algebra Hardin Cunty Middle Schl (2013-2014) 1 8 th Grade Math: Pre-Algebra Curse Descriptin The purpse f this curse is t enhance student understanding, participatin, and real-life applicatin f middle-schl mathematics

More information

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science Weathering Title: Chemical and Mechanical Weathering Grade Level: 9-12 Subject/Cntent: Earth and Space Science Summary f Lessn: Students will test hw chemical and mechanical weathering can affect a rck

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Tutorial 4: Parameter optimization

Tutorial 4: Parameter optimization SRM Curse 2013 Tutrial 4 Parameters Tutrial 4: Parameter ptimizatin The aim f this tutrial is t prvide yu with a feeling f hw a few f the parameters that can be set n a QQQ instrument affect SRM results.

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

BLAST / HIDDEN MARKOV MODELS

BLAST / HIDDEN MARKOV MODELS CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Multiple Source Multiple. using Network Coding

Multiple Source Multiple. using Network Coding Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016 Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an

More information

Floating Point Method for Solving Transportation. Problems with Additional Constraints

Floating Point Method for Solving Transportation. Problems with Additional Constraints Internatinal Mathematical Frum, Vl. 6, 20, n. 40, 983-992 Flating Pint Methd fr Slving Transprtatin Prblems with Additinal Cnstraints P. Pandian and D. Anuradha Department f Mathematics, Schl f Advanced

More information

Determining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm

Determining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm Determining Optimum Path in Synthesis f Organic Cmpunds using Branch and Bund Algrithm Diastuti Utami 13514071 Prgram Studi Teknik Infrmatika Seklah Teknik Elektr dan Infrmatika Institut Teknlgi Bandung,

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

Performance Bounds for Detect and Avoid Signal Sensing

Performance Bounds for Detect and Avoid Signal Sensing Perfrmance unds fr Detect and Avid Signal Sensing Sam Reisenfeld Real-ime Infrmatin etwrks, University f echnlgy, Sydney, radway, SW 007, Australia samr@uts.edu.au Abstract Detect and Avid (DAA) is a Cgnitive

More information

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law Sectin 5.8 Ntes Page 1 5.8 Expnential Grwth and Decay Mdels; Newtn s Law There are many applicatins t expnential functins that we will fcus n in this sectin. First let s lk at the expnential mdel. Expnential

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is Length L>>a,b,c Phys 232 Lab 4 Ch 17 Electric Ptential Difference Materials: whitebards & pens, cmputers with VPythn, pwer supply & cables, multimeter, crkbard, thumbtacks, individual prbes and jined prbes,

More information

THE LIFE OF AN OBJECT IT SYSTEMS

THE LIFE OF AN OBJECT IT SYSTEMS THE LIFE OF AN OBJECT IT SYSTEMS Persns, bjects, r cncepts frm the real wrld, which we mdel as bjects in the IT system, have "lives". Actually, they have tw lives; the riginal in the real wrld has a life,

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information