DP ro: A Probabilistic Approach for Hidden Web Database Selection Using Dynamic Probing

Size: px
Start display at page:

Download "DP ro: A Probabilistic Approach for Hidden Web Database Selection Using Dynamic Probing"

Transcription

1 DP ro: A Probabilisic Approach for Hidden Web Daabase Selecion sing Dynamic Probing Vicor Z. Liu, Richard C. Luo, Junghoo Cho, Wesley W. Chu CLA Compuer Science Deparmen Los Angeles, CA {vicliu, lc, cho, wwc}@cs.ucla.edu Absrac An ever increasing amoun of valuable informaion is sored in Web daabases, hidden behind search inerfaces. To save he user s effor in manually exploring each daabase, measearchers auomaically selec he mos relevan daabases o a user s query [, 5, 6,, 6]. Exising mehods use a pre-colleced summary of each daabase o esimae is o he query, and reurn he daabases wih he highes esimaion. While his is a grea saring poin, he exising mehods suffer from wo drawbacs. Firs, because he esimaion can be inaccurae, he reurned daabases are ofen wrong. Second, he sysem does no ry o improve he qualiy of is answer by conacing some daabases on-he-fly (o collec more informaion abou he daabases and selec daabases more accruaely), even if he user is willing o wai for some ime o obain a beer answer. In his paper, we inroduce he noion of dynamic probing and sudy is effeciveness under a probabilisic framewor: nder our framewor, a user can specify how correc he seleced daabases should be, and our sysem auomaically conacs a few daabases o saisfy he user-specified correcness. Our experimens on 0 real hidden Web daabases indicae ha our approach significanly improves he correcness of he reurned daabases a a cos of a small number of daabase probing. Inroducion An ever increasing number of informaion on he Web is available hrough search inerfaces. This informaion is ofen called he Hidden Web or Deep Web [] because radiional search engines canno index hem using exising echnologies [0, ]. Since he majoriy of Web users rely on radiional search engines o discover and access informaion on he Web, he Hidden Web is pracically inaccessible o mos users and hidden from hem. Even if users are aware of a cerain par of he Hidden Web, hey need o go hrough he painful process of issuing queries o all poenially relevan Hidden Web daabases and invesigaing he resuls manually. On he oher hand, he informaion in he Hidden We call a collecion of documens accessible hrough a Web search inerface as a Hidden-Web daabase. PubMed (hp:// nlm.nih.gov/enrez/query.fcgi) is one example. documen documen eyword frequency in db frequency in db cancer 0,000 5,000 idney 7,000 0,000 breas,000,500 liver 00 4,000 Figure : (Keyword, documen frequency) able. Documen frequency of a eyword in db is he number of documens in db ha use he eyword Web is esimaed o be significanly larger and of higher qualiy han he Surface Web indexed by search engines []. In order o assis users accessing he informaion in he Hidden Web, recen effors have focused on building a measearcher or a mediaor ha auomaically selecs he mos relevan daabases o a user s query [, 5, 4, 5, 6, 8,, 4, 5, 6]. In his framewor, he measearcher mainains a summary or saisics on each daabase, and consuls he summary o esimae he of each daabase o a query. For example, Gravano e al. [4, 6] mainain (eyword, documen frequency) pairs o esimae he daabases wih he mos number of maching documens. We illusrae he basic idea of he exising approaches using he following example.,000 0,000 fracion Example A measearcher mediaes wo Hidden-Web daabases, db and db. Given a user s query q, he goal of he measearcher is o reurn he daabase wih he mos number of maching documens. The measearcher mainains he (eyword, documen frequency) able shown in Figure. For example, he firs row shows ha 0,000 documens in db conain he word cancer while 5,000 documens in db conain cancer. We assume ha each of db and db has a oal of 0,000 documens. Given a user query breas cancer, he measearcher may selec he daabase wih more maching documens in he following way: From he summary we now ha of he documens in db conain he word breas and 0,000 0,000 of hem conain he word cancer. Then, assuming ha he words breas and cancer are independenly disribued, db,000 will have 0, 000 0,000 =, 000 documens wih boh 0,000 0,000 words breas and cancer. Similarly, db will have 0, 000,500 5,000 = 875 maching documens. Based on his esimaion, he measearcher reurns db o he 0,000 0,000 user. References [8] explain in deail how we may consruc his able from hidden Web daabases.

2 In his paper, we improve upon his exising framewor by inroducing he concep of probabilisic correcness and dynamic probing for Hidden Web daabase selecion. One of he main weanesses of he exising mehod is ha he seleced daabases are ofen no he mos relevan o he user s query, because he of a daabase is esimaed based on a pre-colleced summary. For insance, in he above example, he word breas and cancer may no be independenly disribued, and db may acually conain more maching documens han db. Given a wrong answer, he user ends up wasing a significan amoun of ime on he irrelevan daabases. Recen sudy shows ha invesigaing irrelevan Web pages is a major cause of user s wased ime on he Web []. One way of addressing his weaness is o issue he user s query q o all he daabases ha he measearcher mediaes, and selec he bes ones based on he acual resul reurned by each daabase. For insance, he measearcher may issue he query breas cancer o boh db and db in he above example, obain he number of maching documens repored by hem and selec he one wih more maches. While his approach can improve he correcness of daabase selecion, is huge newor and ime overhead maes i impracical when measearchers mediae a large number of Hidden-Web daabases (ofen housands of hem []). In his paper, we develop a probabilisic approach o use dynamic probing (issuing he user query o he daabases on he fly) in a sysemaic way, so ha he correcness of daabase selecion is significanly improved while he measearcher conacs he minimum number of daabases. In our approach, he user can specify he desired correcness of daabase selecion (e.g., more han 9 ou of he 0 seleced daabases should be he acual op 0 daabases ), and he measearcher decides how many and which daabases o conac based on he user s specificaion. Informally, we may consider he user-specified correcness as a nob: When he user does no care abou he answer s correcness, our approach becomes idenical o he exising ones (no dynamic probing). As he user desires higher correcness, our approach will conac more daabases. Our experimenal resuls reveal ha dynamic probing ofen reurns he bes daabases wih a small number of probing. Dynamic probing of Web daabases inroduces many ineresing challenges. For example, how can we guaranee a cerain level of correcness? How can we maximize he correcness wih he minimal number of dynamic probing? Which daabases should we probe? This paper sudies hese problems using a probabilisic approach. Our soluion is based on he following observaions: Alhough he acual of a Web daabase may deviae from an iniial inaccurae esimaion, he way i deviaes follows a probabilisic disribuion ha can be observed. Such a disribuion usually ceners around he esimaed value. If we roughly now his acual disribuion for each daabase, hen we can guess how liely we have seleced he acual op- daabases using hese disribuions. Furhermore, by probing a few daabases, we can obain heir acual values and we can selec op daabases wih higher confidence. Our as of dynamic probing hus becomes using he minimum number of probing o accomplish he user-specified correcness level. We will formalize hese noions, e.g. he probabilisic disribuion and he correcness of an answer, in Secion. Overall, we believe our paper maes he following conribuions:. A probabilisic model for esimaion: Wih he probabilisic model, we can quanify he correcness of daabase selecion. (Secion ). sing dynamic probing o increase he correcness of daabase selecion: We eep on probing ill he cerainy exceeds a user-specified level. (Secion ). Probing sraegies: Our opimal sraegy uses he minimum number of daabase probing o reach he required level of cerainy. (Secion.) We also presen a greedy sraegy ha can idenify op- daabases a reasonable compuaional complexiy. (Secion.) 4. Experimenal validaion: We validae our algorihms using real Hidden Web daabases, under various experimenal seings. (Secion 5) The resuls reveal ha dynamic probing significanly improves he correcness of daabase selecion wih a reasonably small number of probing. For example, wih a single probing, we can improve he correcness of an answer by 70% in cerain cases. A Probabilisic Approach for Dynamic Probing To selec he mos relevan daabases for a query and mae our selecion as correc as possible, we need o fully undersand he of a daabase o a query, and he correcness of a se of seleced daabases. In his secion, we firs define he relevance meric of a daabase. We hen inroduce he noion of expeced correcness for a op- answer se. Finally, we explain he cos model for dynamic probing.. Daabase and probing Relevancy of a daabase Inuiively, we consider a daabase relevan o a query if he daabase conains enough documens perinen o he query opic. The following are wo possible definiions ha reflec his noion of. Documen-frequency-based A daabase is considered he mos relevan if i conains he highes number of maching documens [4, 6]. This number of maching documens is referred o as he documen frequency of he query in he daabase. Documen-similariy-based A daabase is considered he mos relevan if i conains he mos similar documen(s) o he query [5,, 5]. Query-documen similariy is ofen compued using he sandard Cosine funcion []. Relevancy esimaion A measearcher has o esimae he approximae of a daabase o a query using a pre-colleced summary. Noe ha his esimae may or may no be he same as he acual of he daabase. We refer o he esimaed of a daabase db o a query q as r(db, q). To mae our laer discussion concree, we now briefly illusrae how we may esimae he relavancy of a daabase under he documen-frequency-based meric [4]. Noe, however, ha our framewor is independen of he paricular meric and he esimaor used by a measearcher. Our approach can be used for any meric and esimaor combinaion. In [4, 6], Gravano e al. compue r(db, q) by assuming ha he query erms q = {,..., m} are independenly disribued in db. sing heir independence esimaor, r(db, q) can be compued as follows:

3 r(db, q) = db i q r(db, i) db where db is he size of db and r(db, i) is he number of documens in db ha use i. Noe ha Eq.() assumes ha r(db, i) is available o he measearcher for every erm i and every daabase db. In pracice, however, a hidden web daabase seldom expors such an exhausive conen summary o he measearcher. Reference [8] proposes an approximaion mehod o guess he r(db, i) values for all query erms. Daabase probing We define probing a daabase as he operaion of issuing a paricular query o he daabase and gahering he necessary informaion o evaluae is exac o he query. Depending on he meric, we need o collec differen informaion during probing. For example, under he documen-frequency-based meric, we need o collec he number of maching documens from he probed daabase, while under he documen-similariy-based meric, we need o collec he similariy value of he mos similar documen(s) in he probed daabase. For mos exising Hidden Web daabases, we noe ha i is possible o ge heir exac hrough simple operaions. For insance, many Hidden Web daabases repor he number of maching documens in heir answer page o a query, so we can easily compue heir exac documen-frequency-based. Also, under he documen-similariy-based meric, we may download he firs documen ha a Hidden Web daabase reurns, and hen analyze is conen o compue is cosine similariy. In he remainder of his paper, we refer o he exac of a daabase db o a query q as r(db, q). Thus, afer probing db, is esimaed r(db, q) becomes r(db, q).. Correcness meric for he op- daabases Our goal is o find he daabases ha are mos relevan o a query. We represen his se of correc op- answers as DB op. We refer o he se of daabases seleced by a paricular selecion algorihm as DB. We may define he correcness of DB compared o DB op in one of he following ways. Absolue correcness: We consider DB is correc only when i conains all DB op. Definiion The absolue correcness of DB compared o DB op is { Cor a(db if DB ) = = DB op 0 oherwise Parial correcness: We give parial credi o DB if i conains some of DB op. Definiion The parial correcness of DB compared o DB op is Cor p(db ) = DB DB op In his definiion, he correcness value of a op-5 answer se is 0.4 if i conains of he acual op 5 daabases. We sudy boh of hese merics in his paper. For reader s convenience, we summarize he noaion ha we have inroduced in Figure. Some of he symbols will be discussed laer. () Symbol Meaning DB {db,..., db n}, he oal se of daabases q The user s query The number of op daabases ased by he user r(db, q) The acual of db for q r(db, q) The esimaed of db for q DB A se of daabases seleced by a paricular algorihm, DB DB DB op The se of correc op- daabases Cor a(db ) Absolue correcness meric for DB Cor p(db ) Parial correcness meric for DB DB P The se of daabases ha have already been probed DB The se of daabases ha have no been probed, i.e. DB DB P PRD Probabilisic Relevancy Disribuion P (r(db, q) α The probabiliy of r(db, q) being lower han α, r(db, q) = β) given he esimaion r(db, q) = β. This probabiliy is given by he PRD. α and β are specific values E[Cor(DB )] The expeced correcness of DB, here Cor can be Cor a or Cor p The user-specified hreshold of he answer s expeced correcness c The cos of probing a daabase ECos(DB ) The expeced probing cos on he se of unprobed daabases, DB err(r, r) The error funcion compuing he difference beween r(db, q) and r(db, q) Figure : Noaion used hroughou he paper. Probabilisic disribuion and expeced correcness While we may esimae he of a daabase db o a query q, r(db, q), using exising esimaors, we do no now he exac r(db, q) value unil we acually probe db. Therefore, we may model r(db, q) o follow a probabilisic disribuion ha (hopefully) ceners around he r(db, q) value. We refer o his disribuion as a Probabilisic Relevancy Disribuion, or PRD. In Figure (a), we show example PRDs for four daabases, db,..., db 4. The horizonal axis in he figure represens he acual value of a daabase and he verical axis shows he probabiliy densiy ha he acual is a he given value. For insance, for db, he esimaed is 0.5, and he acual lies beween 0. and (We explain he impulses for db and db shorly.) Formally, a PRD ells us he probabiliy ha r(db, q) is lower han a cerain value α given he esimae r(db, q) equals o β: P (r(db, q) α r(db, q) = β). In Secion 4 we explain how we can obain a PRD by issuing a small number of sample queries o a daabase. For now we assume ha he measearcher nows he PRD of every daabase. Noe ha afer probing db, he r(db, q) value is nown. Thus he PRD for r(db, q) changes from a broad disribuion o an impulse funcion. For example, in Figure (a), we assume db and db have already been probed, so heir PRDs have become impulses a heir correc values, 0.8 and 0.6, respecively. In he middle of a dynamic-probing process, herefore, we have impulse PRDs for he probed daabases, and regular PRDs for he res. We now illusrae how we can use he PRDs o esimae he probabiliy ha a op- answer DB is correc. Example We assume he siuaion shown in Figure (a): db and db have already been probed and heir values

4 probabiliy densiy db db db4 0% 0. r(db4,q) probabiliy densiy r(db,q) r(db,q) r(,q) (a) afer probing db and db db4 0. r(db4,q) db db r(db,q) r(db,q) r(,q) P he resul qualiy. Given he user s expeced correcness specificaion, he measearcher eeps on probing daabases ill i finds a DB ha exceeds he user-specified hreshold. To help our discussion, we refer o he se of daabases ha have been probed during his process as DB P and he se of unprobed daabases as DB. Noe ha he reurned daabases DB may or may no be he same as he probed daabases DB P. In paricular, DB may conain a daabase db ha may have no been probed (db / DB P ). As long as he measearcher is confiden ha r(db, q) is higher han hose of ohers, i is safe o reurn db as par of DB. From he example, i is clear ha we should be able o compue he expeced correcness for DB given PRDs of he daabases. We use he noaion E[Cor a(db )] and E[Cor p(db )] o refer o he expeced correcness of DB under he absolue and parial correcness meric, respecively. When we do no care abou a paricular correcness meric, we use he noaion E[Cor(DB )]. (b) afer probing db, db and db Figure : The Probabilisic Relevancy Disribuion of differen daabases a various sages of probing are 0.8 and 0.6, respecively. We do no now he exac values for db and db 4, bu he PRD of db indicaes ha r(db, q) r(db, q) = 0.6 wih 0% probabiliy. We use he absolue correcness P meric of an answer se. Now suppose he user wans he measearcher o reurn he op- daabases. In his scenario, if we reurn {db, db }, our answer is correc (Cor a({db, db }) = ) wih 80% probabiliy, because r(db, q) < r(db, q) wih 80% probabiliy (in which case {db, db } are he acual op- daabases). Wih he remaining 0% probabiliy, r(db, q) may be larger han r(db, q), so our answer {db, db } is wrong (Cor a({db, db }) = 0) wih 0% probabiliy. Therefore, he expeced correcness of he an- swer {db, db }, is = 0.8. The expeced correcness in Example can be beer undersood on a saisical basis. For example, if he user issues,000 queries o a measearcher, and he measearcher reurn DB such ha is expeced correcness is greaer han 0.8 for every query, hen he user ges correc answers for a leas 800 queries. We now illusrae how a user may usehe expeced correcness o specify he qualiy of he answer and how he measearcher can use dynamic probing o mee he user s specificaion. Example Sill consider he siuaion in Figure (a). Afer probing db and db, he measearcher nows ha he expeced cor- recness of {db, db } is 0.8. If he user only requires 0.7 expeced correcness, he measearcher can sop probing and reurn {db, db }. If he user s hreshold is 0.9, he measearcher has o probe more daabases. Suppose he measearcher pics db for probing. The resuling PRDs are shown in Figure (b). Now he measearcher nows ha db and db 4 are definiely smaller han db, and {db, db } mus be he correc answer. Therefore he expeced correcness of {db, db } is (which exceeds he user s hreshold, 0.9). As a resul, he measearcher can sop probing and reurn {db, db }. According o our Cor a and Cor p definiions, he expeced correcness can be compued as: E[Cor a(db )] = P (DB = DB op ) + 0 P (DB DB op ) = P ( DB DB op = ) () E[Cor p(db )] = i P ( DB DB op = i) () i The following heorems ell us how o compue he expeced absolue correcness, E[Cor a(db )], and he expeced parial correcness, E[Cor p(db )], using he PRD of each daabase. We label he daabases in DB as db, db,..., db, and label he daabases in DB DB as db +,..., db n. Le f j(x j) be he Probabiliy Densiy Funcion derived from db j s PRD ( j n), and x j be one possible value of r(db j, q). Theorem Assuming ha all daabases operae independenly, E[Cor a(db )] = P (r(db, q) < min(x,..., x )) db DB DB f j(x j) dx...dx j where min(x,..., x ) is he minimum value among all he db j DB. The above example shows ha we can consider he expeced correcness as he nob ha he user can urn so as o conrol Noe ha r(db 4, q) is always smaller han r(db, q) and r(db, q). Proof See Appendix 4

5 dbn c sage sage Dynamic Prober DB d d Documen Reriever relevan documens from DB probed daabase unprobed daabase reurn his DB YES Any DB ha has an expeced correcness above he user s hreshold? NO PRDs of he probed and unprobed daabases DB DB P Figure 4: Two sage cos models Theorem Assuming ha all daabases operae independenly, E[Cor p(db )] = i i DB i DB DB db DB i P (r(db, q) > i highes(x,..., x )) probe DB: dbn dbi+ of db, 4 hen he maasearcher may have cached he re- rieved documens, so ha i will no conac db again in dbi move from DB o DB P Figure 5: The dynamic probing process he second sage. In his case, he measearcher only conacs db in he second sage o rerieve is op raning documens. Le he rerieval cos for an unprobed daabase be d and he cos for a probed daabase be d (d d). Then he probing and rerieval cos in he overall measearching process is db (DB DB DB i ) j P (r(db, q) < i highes(x,..., x )) f j(x j) dx...dx where i highes(x,..., x ) is a funcion ha compues he i h highes value among all he db j DB. DB P c + DB DB P d + DB DB P d In his paper, we mainly use he he probing cos model as our cos meric. Noe ha an opimal algorihm for he probing cos model may no be opimal for he PR cos model: Even if an algorihm does fewer probing during he firs sage, he algorihm may incur a significan cos during he second sage if none of he reurned daabases was probed. However, he following heorem shows ha under a cerain condiion, an opimal probing sraegy for he probing cos model is also opimal for he PR cos model. Proof See Appendix.4 Two-sage cos models When a user ineracs wih a measearcher, his evenual goal is o rerieve a se of relevan documens. Therefore, he overall measearching process can be separaed ino wo sages as shown in Figure 4. In he firs sage, he dynamic prober finds an answer se DB by probing a few daabases. In he second sage, he documen reriever conacs each seleced daabase, rerieves he relevan documens and reurns hem o he user. In measuring he cos of our measearching framewor, we may use one of he following merics: Probing cos model: We only consider he cos for he probing sage, ignoring he cos for he documen-rerieval sage. We assume ha he probing cos of a single daabase is c and is idenical for every daabase. (I is sraighforward o exend our model o he case where he probing cos for each daabase is differen.) Since he dynamic prober does DB P number of probing in he firs sage, DB P c is he cos under his model. Probing-and-Rerieval (PR) cos model: We also consider he cos for he documen rerieval sage. The cos for he rerieval sage may depend on wheher a seleced daabase was probed or no in he firs sage. For example, suppose he dynamic prober reurns {db, db } as he op- daabases afer probing db (bu no db ). If he dynamic prober has rerieved he op raning documens of db during is probing Theorem nder he condiion ha DB DB P (i.e. all he reurned daabases have been probed), he opimal probing algorihm under he probing-only cos model is also opimal for he PR cos model. Proof See Appendix In our experimens, we observed ha he condiion in he above heorem is valid for mos cases. Tha is, DB DB P in he majoriy of cases, which means our algorihm is opimal also for he PR cos model. The Dynamic Probing Algorihm Given a query q, n daabases and a hreshold, our goal is o use a minimum number of probing o find a -subse DB whose expeced correcness exceeds. Figure 5 roughly illusraes our dynamic probing process o achieve his goal. A any inermediae sep of he dynamic probing, he enire se of daabases DB is divided ino wo subses: he se of probed daabases DB P and he se of unprobed daabases DB. Based on he impulse and regular PRDs of DB P and DB, we compue he expeced correcness of every -subse DB (using Theorem for he absolue correcness, for example). If here is a DB such ha E[Cor(DB )], he dynamic probing hals and reurns his DB ; oherwise i coninues o probe one more daabase in 4 Which will be necessary if our definiion is documensimilariy-based (Secion.) 5

6 db db db db4 0% db4 db 0. r(db4,q) (a) afer probing Algorihm db and. db DP ro(db, q,, (b) ) afer probing db, db and db Inpu: PRDs of he probed and DB: he enire unprobed se ofdaabases given daabases, {db,...,db n } YES q: a given query : he number of daabases o reurn : he user s hreshold for E[Cor(DB )] reurn his DB Any DB ha has an expeced correcness above he user s hreshold? NO r(db,q) r(db,q) r(,q) Oupu: DB wih E[Cor(DB DB )] P Procedure DB: dbn [] P dbi+, DB dbi DB [] If (E[Cor(DB move from probe )] ) for some DB DB o DB P Reurn DB [] db i SelecDb(DB ) [4] Probe db i [5] Change he PRD of db i from regular o an impulse [6] DB P DB P {db i }, DB DB {db i } [7] Go o [] 0. r(db4,q) r(db,q) r(db,q) r(,q) db afer probing db hen probe db db db (a) P original PRDs of db, db and db db db (b) he oucome of probing db db db db Figure 7: Selecing he op- daabases from {db, db, db } (c) he firs possible oucome of probing db db db (d) he second possible oucome of probing db Figure 6: The dynamic probing algorihm DP ro DB, moves i from DB o DB P, and recompues he expeced correcness for every DB. Figure 6 provides he algorihm of our dynamic probing process. A each ieraion, we ry o find a -subse DB ha has he desired level of expeced correcness and reurn i (Sep []). If no such DB exiss we pic a daabase from he unprobed se (Sep []), probe i (Sep [4]) and recompue he expeced correcness (goo Sep []). Noe ha one ey issue in his algorihm is how SelecDb(DB ) should pic he nex bes daabase o probe in order o minimize probing cos. In he nex subsecion, we derive he answer o his quesion. Algorihm. SelecDb(DB ) Inpu: DB : he se of unprobed daabases Oupu: db i: he nex daabase o probe Procedure [] For every db i DB : [] cos i = c + ECos(DB {db i}) [] Reurn db i wih smalles cos i Figure 8: The opimal SelecDb(DB ) funcion. Selecing he opimal candidae daabase for probing In SelecDB(DB ), we need o selec he nex daabase candidae ha will lead o he earlies erminaion of he probing process and hus minimizing he probing cos. This daabase ofen should no be he one wih he larges expeced. Consider he following example. Example 4 We wan o reurn he op- daabases from {db, db, db }. We have no probed any of hem. Figure 7(a) shows heir PRDs. We assume ha E[Cor(DB )] is smaller han he user hreshold for any DB {db, db, db } ye. We need o pic he nex daabase o probe. Noe ha we do no need o probe db, because is is he highes among all hree, and i will always be reurned as par of DB. Probing db does no increase answer correcness a all. Similarly, noe ha probing db is no very helpful, eiher. Because r(db, q) lies beween he wo peas of r(db, q), even afer we probe db (Figure 7(b)), i is sill uncerain which one (beween db and db ) will have higher. In consras, probing db is very liely o improve he cerainy of our answer. Given he PRD of db, r(db, q) will be eiher on he lef side of r(db, q) (Figure 7(c)) or on he righ side (Figure 7(d)). If i is on he lef side (Figure 7(c)), we can reurn {db, db } as he op- daabases. If i is on he righ side (Fig- ure 7(d)), we can reurn {db, db } as he op- daabases. In eiher case, we can reurn he op- daabases wih high confidence. Therefore, SelecDb(DB ) should pic db as he nex daabase o probe, because we can finish he probing process only afer one probing. Oherwise our algorihm needs a leas wo probing o hal. Noice ha he expeced of db is he lowes among he hree daabases. The nex daabase o probe is no he one wih he highes expeced. From his example, we can see ha he funcion SelecDb(DB ) should pic he db i DB ha yields he smalles number of expeced probing. To formalize his idea, we inroduce he noaion ECos(DB ) o represen he expeced amoun of addiional probing on DB afer we have probed DB DB (=DB P ). Now we analyze he expeced probing cos if we pic db i DB as he nex daabase o probe. The cos for probing db i iself is c. The expeced cos afer probing db i is ECos(DB {db i}) under our noaion. Therefore, by probing db i nex, we are expeced o incur c+ecos(db {db i}) addiional probing cos. Based on his undersanding, we now describe he funcion SelecDb(DB ) in Figure 8. In Seps [] and [], he algorihm firs compues he expeced addiional probing cos for every db i DB. Then Sep [] reurns he one wih he smalles cos. 6

7 Algorihm. ECos(DB ) Inpu: DB : he se of unprobed daabases Oupu: cos: he expeced probing cos for DB Procedure [] If (E[Cor(DB )] ) for some DB DB Reurn 0 [] For every db i DB : [] cos i = c + ECos(DB {db i}) [4] Reurn min(cos i) Figure 9: Algorihm ECos(DB ) We now explain how we can compue ECos(DB ) using recursion. We assume ha we have probed DB P so far, and DB (= DB DB P ) have no been probed ye. There are wo possible scenarios a his poin: Case (Sopping condiion): Wih he daabases DB P probed, we can find a DB DB such ha E[Cor(DB )]. In his case, we can simply reurn DB as he op- daabases. We do no need any furher probing. Thus, ECos(DB ) = 0 (4) Noe ha when all daabases have been probed (DB = ), we now he exac of all daabases, so ECos(DB ) = 0. Case (Recursion): There is no DB DB whose expeced correcness exceeds. Therefore, we need o probe more daabases o improve he expeced correcness. Assume we probe db i DB nex. Then he expeced probing cos is c + ECos(DB {db i}). Remember ha SelecDb(DB ) always pics he db i wih he minimal expeced cos. Therefore, he expeced cos a his poin is ECos(DB ) = min db i DB (c + ECos(DB {db i})) (5) Figure 9 shows he algorihm o compue ECos(DB ). In Sep [], we firs chec wheher we have reached he sopping condiion. If no, we compue he expeced probing cos for every db i DB (Seps [] and []), and reurn he minimum expeced cos (Sep [4]). The following heorem shows he opimaliy of our algorihm SelecDb(DB ). Theorem 4 SelecDb(DB ) reurns he daabase ha leads o he minimum expeced probing cos, ECos(DB ), on he se of unprobed daabases DB. Proof See Appendix Noe ha he compuaion of ECos(DB ) is recursive and can be very expensive. For example, assume ha DB = {db,..., db n} as we show in Figure 0. To compue ECos(DB ), we have o compue ECos(DB {db i}) for every i n (firs-level branching in Figure 0). n case: n- case: ECos(DB -{}) db dbn ECos(DB ) Figure 0: Exploring a search ree o compue ECos(DB ) Algorihm.4 greedyselecdb(db ) Inpu: DB : he se of unprobed daabases Oupu: db i: he nex daabase o probe Procedure [] For every db i DB : [] ECor i = max (E[Cor(DB )] afer probing db i) DB DB [] Reurn db i wih he highes ECor i Figure : The greedy SelecDb(DB ) funcion Then o compue ECos(DB {db i}), we need o compue ECos(DB {db i, db j}) for every j i (second-level branching in Figure 0). Therefore, he cos for compuing ECos(DB ) is O(n!) if DB = n. Clearly his is oo expensive when we mediae a large number of daabases. In he nex subsecion, we propose a greedy algorihm ha reduces he compuaion complexiy of selecing he nex daabase o O(n).. A greedy choice The goal of he DPro algorihm is o find a DB wih E[Cor(DB )] using minimum number of probing. Thus, he opimal DPro compues he expeced probing cos for all possible probing scenarios and pics he one wih he minimum cos. Informally, we may consider ha he opimal DPro loos all seps ahead and pics he bes one. Our new greedy algorihm, insead, loos only one sep ahead and pics he bes one. The basic idea of our greedy algorihm is he following: Since we can finish our probing process when E[Cor(DB )] exceeds for some DB, he nex daabase ha we probe should be he one ha leads o he highes E[Cor(DB )] afer probing (hus mosly liely o exceed early). Noice he suble difference beween he opimal algorihm and he greedy algorihm. The opimal algorihm compues ECos(DB ) for all possible scenarios, while our greedy algorihm compues E[Cor(DB )] afer we probe only one more daabase db i. sing Theorem we can compue E[Cor a(db )] afer we probe db i if we now he PRD of each daabase. 5 In Figure, we show a new SelecDb(DB ) funcion ha implemens his greedy idea. In Seps [] and [], he algorihm compues he expeced correcness value afer we probe db i. Then in Sep [] i reurns he db i ha leads o he highes expeced correcness. 5 Since we do no now he oucome of probing db i, we need o use Theorem o compue and expeced E[Cor(DB )] value, based on db i s PRD. The deailed formula is provided in he appendix. db dbn 7

8 -50 P (r-r 50) = r-r Figure : Disribuion of he absolue-error funcion 4 Probabilisic Relevancy Disribuion In Secion, we assumed ha he PRD for daabase db was already given. We now discuss how we obain he PRD ha gives us P (r(db, q) α r(db, q) = β), where α and β are specific values. For simpliciy, we use r and r o represen r(db, q) and r(db, q). Our basic idea is o use sampling o esimae he PRD. Tha is, we issue a small number of sampling queries, say 000, o db and observe how he acual r values are disribued. From his resul, we can compue he difference of r from r and obain he disribuion. Noe ha he PRD P (r α r = β) is condiional on r. Therefore, he exac shape of he PRD may be very differen for differen r values. Ideally, we have o issue a number of sampling queries for each r value, in order o obain he correc PRD shape for each r. However, issuing a se of queries for each r is oo expensive given ha here are an infinie number of r values. To reduce he cos for PRD esimaion, we assume ha he disribuion we observe is independen of wha r may be. More precisely, we may consider one of he following independence assumpions: Absolue-error independence: We assume ha he absolue error of our esimae, r r (he difference beween our esimae and he acual ), is independen of he r value. Therefore, from our sampling queries, we obain a single disribuion for (r r) values (even if he r values for he queries are differen), and use he disribuion o derive he PRD. Relaive-error independence: We assume ha he relaive error of our esimaion, r r, is independen of he r value. r Therefore, from sampling queries, we obain a single disribuion for r r values (even if heir r values for he queries r are differen) and use he disribuion o derive he PRD. In general, if here is an error funcion err(r, r) (e.g., err(r, r) = r r for he firs case) whose disribuion is independen of r, hen we can use jus one se of queries (regardless of heir r values) o esimae he err(r, r) disribuion. Then using his disribuion, we can obain he correc PRD for every r value. This can be illusraed hrough he following example: Example 5 Suppose from,000 sampling queries, we are able o obain a probabiliy disribuion for he absolue-error funcion: err(r, r) = r r, as shown in Figure. Assume ha r r is independen of r. Le us derive he probabiliy P (r 50 r = 00) using his disribuion. P (r 50 r = 00) = P (r r 50 r = 00) = P (r r 50) (independency of r r and r) This probabiliy P (r r 50), as shown in Figure, is 0.8. More formally, we observe ha he error funcion err(r, r) should saisfy he following properies o derive a PRD: I. Independency: err(r, r) is probabilisically independen of r II. Monooniciy: err(r, r) err(r, r) for any r r The following heorem shows ha he probabiliy of a value can be obained hrough he probabiliy of he error funcion, via a variable ransformaion from r o err(r, r). Theorem 5 If err(r, r) is independen and monoonic, hen P (r α r = β) = P (err(r, r) err(α, β)) (6) Proof See Appendix. In Secion 5, we compare he absolue-error funcion err a(r, r) = r r, and he relaive-error funcion, err r(r, r) = (r r) r experimenally. Our resul shows ha he relaive-error funcion wors well in pracice and roughly saisfies he wo properies in Theorem 5. 5 Experimens This secion repors our experimenal resuls ha esify he effeciveness of he dynamic probing approach. Secion 5. describes he experimenal seup and he daase we use. Secion 5. experimenally compares he error funcions o derive a PRD. Secions 5. and 5.4 show he improvemen of our dynamic probing compared o he exising mehods. 5. Experimenal seup In our experimens, we simulae a real measearching applicaion by mediaing 0 real Hidden-Web daabases and using 4,000 real Web query races. The daabases for our experimens are mainly relaed o healh. Thus, we may consider ha he experimens evaluae he effeciveness of our dynamic probing approach in he conex of a healh-relaed measearcher. In his subsecion, we explain our experimenal seup in deail. Firs, we selec daabases from he healh caegory of InvisibleWeb, 6 which is a manually-mainained direcory of Hidden- Web daabases. While he direcory liss more han healhrelaed daabases, mos of hem are eiher obsolee or oo small. In our experimens, we use only he daabases wih a leas,000 documens. Mos of he small daabases are relaively obscure and of lile ineres. Because is a relaively small number, and in order o inroduce heerogeneiy o our experimens, we append four more daabases on broader opics (e.g., Science and Naure), and hree more news websies (e.g., CNN and NYTimes). We show some sample daabases and heir sizes in Figure. 7 The complee lis of our daabases can be found in [0]. Second, we selec a subse of queries from a real query race from Yahoo (provided by Overure 8 ). We sar by building a sample medical vocabulary using single erms exraced from he healh opic pages in MedLinePlus, 9 an auhoriaive medical informaion websie. We hen randomly pic any -erm and -erm queries from he Yahoo query race ha use a leas wo erms from our vocabulary. Again, his selecion was done o simulae a measearcher ha focuses on healh-relaed opics. 6 hp:// 7 For daabases ha do no expor heir sizes, we roughly esimae he size by issuing a query wih common erms, e.g. medical OR healh OR cancer... 8 hp://invenory.overure.com/ 9 hp:// 8

9 p Daabase RL Size MedWeb 4,000 PubMed Cenral 60,000 NIH 6,799 Science 9,65 Figure : Sample Web daabases used in our experimen sing he above selecion mehod, we prepare a sample query se QS which conains,000 -erm queries and,000 -erm queries. We use QS in Secion 5. o derive he PRD for each daabase. Similarly, we prepare anoher query se QS ha conains, again,,000 -erm queries and,000 -erm queries. QS is used in Secions 5. hrough 5.4 when we evaluae how well our dynamic prober wors. Noe ha a ypical Web query has only a small number of erms, wih. erms on average [9]. Therefore, we believe our experimens using or -erm queries reflec he ypical scenario ha a real measearcher can expec. In all of our experimens, we use documen-frequency-based meric (Secion.) and independence esimaor (Eq.). Furher, we use he independency esimaor o creae a baseline represening radiional esimaion-based selecion mehods. All of our dynamic probing is done using he greedy algorihm (Secion.). Due o is exponenial compuaional cos, i aes a long ime for he opimal algorihm o erminiae, so we could no finish enough experimens o include heir resuls in his draf. 5. Selecing an error funcion o derive he correc PRD To obain he correc PRD from he err(r, r) disribuion, err(r, r) needs o be monoonic and independen of r (Theorem 5). In his subsecion, we experimenally compare he absolue-error funcion err a(r, r) = r r wih he relaive-error funcion, err r(r, r) = (r r), and selec he beer one for our r experimen. From heir analyical forms, i is easy o verify ha boh error funcions are monoonic. Wha we need o verify is he independence propery. This can be done by compuing he saisical correlaion beween err(r, r) and r, where err can be err a or err r. If he correlaion is close o 0, i means err and r are roughly independen; Oherwise hey are no. More specifically, we firs obain he err(r, r) value for every,000 -erm query in QS on a daabase db. We hen compue he correlaion beween err(r, r) and r on hese,000 queries, regarding db. We repea his process for all 0 daabases and compue he average correlaion over all daabases. We similarly compue he correlaion for he,000 -erm sample queries in QS, and summarize he resuls in Figure 4. The max correlaion value among he 0 daabases is also included o show he exreme cases. Figure 4(a) shows ha he absolue error err a(r, r) has a high posiive correlaion wih r for boh -erm and -erm queries. Therefore, err a(r, r) is dependen on r and becomes larger as r ges larger. Figure 4(b) reveals ha he relaive error err r(r, r) is roughly independen of r. Therefore we use err r(r, r) as he error funcion o derive a PRD. From our experimens, we observe ha he shape of err r(r, r) disribuion for -erm queries is slighly differen from ha for -erm queries. Therefore, we mainain wo PRDs for each daabase, one for -erm queries and he oher for -erm queries, and pic he appropriae PRD depending on he number of erms in a query. ( p g) Correlaion beween err a and r -erm -erm average max (a) The average and maximum correlaion beween err a(r, r) and r for 0 daabases Correlaion beween err r and r -erm -erm average max (b) The average and maximum correlaion beween err r(r, r) and r for 0 daabases Figure 4: The correlaion beween err(r, r) and r Avg(Cora) # of probing Figure 5: The effec of dynamic probing on he average correcness ( =, = 0.9) 5. Effeciveness of dynamic probing In he second se of our experimens, we sudy he impac of dynamic probing on he correcness of daabase selecion. Our main goal in his secion is o invesigae how accurae an answer becomes as we probe more daabases, so we resric our experimens only o he queries ha require a leas hree probing for DPro o erminae (i.e. E[Cor(DB )] only afer hree probing). When we se = 0.9 and = 0 as our parameers,,0 ou of he,000 es queries in QS (,000 -erm queries and,000 -erm queries) belong o his caegory. For each query issued, we hen as DP ro o repor he daabase wih he highes expeced correcness afer each probing (even if i has no erminaed ye). By comparing his repored daabase o he mos relevan daabase (i.e., DB = DB op?) we can measure how accurae he answer becomes as we probe more daabases. Noe he correc DB op is inaccessible o DPro during is probing process. Figure 5 summarizes he resul from hese experimens. In he figure, he horizonal axis shows he number of probing ha DPro has performed so far. The verical axis shows he fracion of correc answers ha DP ro repors a he given number of probing. For example, afer one probing, DPro repors he correc daabase for 54 queries ou of,0, so he average correcness is 54/, 0 = 0.5 a one probing. Noe ha a he poin of no probing (# of probing = 0), DPro is idenical o he radiional esimaion-based mehod because i does no use any dynamic probing. A his poin average correcness is only 0.0. Afer wo probing correcness reaches From his resul, i is clear ha dynamic probing significanly improves he answer correcness: We can improve he correcness of he answer by more han wice wih only wo probing. 0 Noe ha Cor a and Cor p are he same when =. Therefore we do no specify our correcness meric in his experimen. ( p g) 9

10 ( p g) ( p g) Avg # of probing probing using Cora and Corp Avg # of probing probing using Cora probing using Corp (a) = (b) = (c) =5 Avg # of probing probing using Cora probing using Corp Figure 6: The average number of probing under differen seings of and 5.4 The average amoun of probing under differen seings In his subsecion, we sudy how many probings DPro does for differen seings of. We experimen on six values: {0.7, 0.75, 0.8, 0.85, 0.9, 0.95}. For a larger, i is expeced ha DP ro probes more daabases o mee he hreshold. In Figure 6, we show how he number of probing increases as becomes larger. The x-axis shows he differen values, and he y-axis is he average number of probing DP ro does for a paricular, over he,000 es queries in QS. For example, when = (Figure 6(a)), DPro erminaes afer probing on average for he hreshold value = 0.9. In Figures 6(b) and (c) we include he resuls for he absolue (Cor a) and parial (Cor p) correcness merics. When =, he wo correcness meric are he same, so we have only one graph in Figure 6(a). Noe ha he graph for Cor a is always above ha of Cor p. Since Cor p is always larger han Cor a, DPro reaches he correcness hreshold faser under Cor p and erminaes earlier. The figure shows ha our algorihm DPro can find correc daabases wih a reasonable number of probing. For example, when = 5 and = 0.9 (Figure 6(c)), DPro finds a DB wih E[Cor p(db )] > 0.9 afer 6.8 probing. In mos cases, all 5 reurned daabases are probed during he selecion process. Tha means ha 5 of 6.8 probing is done on he op- daabases reurned, so he informaion ha we collec during he probing sage can be used o reduce he cos for he documen rerieval sage (Figure 4). So he exra probing in he overall measearching process is only.8. Noe ha even if he user specified hreshold is 0.7, he op- daabases ha DPro reurns may be correc in more han 70% of he ime. ser hreshold is simply a lower bound for he correcness of he reurned answer. To show how accurae answers DPro reurns, Figure 7 and Figure 8 show he average correcness of he answers for differen hreshold values. Figure 7 shows he resul under he Cor a meric, and Figure 8 shows he resul under he Cor p meric. The baseline (he riangle line) is he average correcness of he radiional esimaion-based selecion. Since he radiional mehod does no depend on he value, he average correcness remains consan. The doed lines in he figures represen Avg(Cor) =. The average correcness of he answers from DPro should be higher han he doed line, since is he minimum hreshold value for DPro o erminae. From he graphs, we can see ha his is indeed he case. 6 Relaed wor Daabase selecion is a criical sep in he measearching process. Pas research mainly focused on applying cerain approximae mehod o esimae how relevan a daabase is o he user s query. The daabases wih he highes esimaed are seleced and presened o he user. The qualiy of daabase selecion is highly dependen on he accuracy of he esimaion mehod. In he early wor of bgloss [4] ha mediaes daabases wih boolean search inerfaces, a measearcher esimaes he of each daabase by assuming query erms appear independenly. vgloss [5] exends bgloss o suppor daabases wih vecorbased search inerfaces, and uses a high-correlaion assumpion or a disjoin assumpion on query erms o esimae he of a daabase under he vecor-space-model. [] uses erm covariance informaion o model he dependency beween each pair of erms, and achieve beer esimaion han vgloss. An even beer esimaion is repored in [5] by incorporaing documen linage informaion. There have been parallel research in he disribued informaion rerieval conex. In [, 5, 4] he of a daabase is modelled by he probabiliy of he daabase conaining similar documens o he query. In [4], various esimaion mehods discussed above are compared on a common basis. Our dynamic probing mehod is orhogonal o hese research in ha we are no proposing a new esimaion mehod under cerain definiion. Insead, we use probabilisic disribuion o model he accuracy of a paricular esimaion mehod, and use probing o increase he correcness of daabase selecion. Daabase selecion is relaed o a broader research area called op- query answering. Pas research [, 7, 8, 9] largely focused on relaional daa, and use deerminisic mehods o find he absoluely correc op- answers. While in our conex of Hidden-Web-daabase-selecion, enforcing he deerminisic approach would end up probing almos all he Hidden-Web daabases. In our probabilisic approach, we only probe he daabases ha would maximally increase our cerainy of he op answers. Mediaing heerogenous daabases o provide a single query inerface has been sudied for years [7, ]. While he exising research focused on inegraing daa sources wih relaional search capabiliies, we in his paper invesigae he mediaion of Hidden- Web daabases wih much more primiive query inerfaces over a collecion of unsrucured exual daa. 7 Conclusion We have presened a new approach o he Hidden Wed daabase selecion problem using dynamic probing. In our approach, he accuracy of a paricular esimaor is modelled using Probabilisic Relevancy Disribuion (PRD). The PRD enables us o quanify he correcness of a paricular op- answer se in a probabilisic sense. We propose an opimal probing sraegy ha uses he leas probing o reach he user-specified correcness hreshold. A greedy probing sraegy wih much less compuaion complexiy is also presened. Our experimenal resuls reveal ha dynamic probing significanly improves he answer s correcness wih a reasonably small amoun of probing. 0

11 Avg(Cora) Dynamic probing using Cora Esimaion-based daabase selecion (no probing) Avg(Cora) = Avg(Cora) Dynamic probing using Cora Esimaion-based daabase selecion (no probing) Avg(Cora) = ( p g) ( 0.85 p g) ( p 0.85 g) (a) = (b) = (c) =5 Avg(Cora) Dynamic probing using Cora Esimaion-based daabase selecion (no probing) Avg(Cora) = Figure 7: Avg(Cor a): dynamic probing vs. he esimaion-based daabase selecion Avg(Corp) ( p g) Dynamic probing using Corp Esimaion-based daabase selecion (no probing) Avg(Corp) = p Avg(Corp) p 0.8 ( p g) ( p g) Dynamic probing using Corp Esimaion-based daabase selecion (no probing) Avg(Corp) = Dynamic probing using Corp Esimaion-based daabase selecion (no probing) Avg(Corp) = (a) = (b) = (c) =5 Figure 8: Avg(Cor p): dynamic probing vs. he esimaion-based daabase selecion Avg(Corp) p Our experimenaion on real daases jusifies an effecive new direcion for he measearching research. In he pas, researchers ried o improve he correcness of daabase selecion via consrucing more accurae esimaors ha esimaes he daabase s o a paricular query. A more accurae esimaor demands more comprehensive conen-summary of each daabase. For example, soring he pair-wise erm covariance [, 5] aes O(M ) of space, where M is he size of he vocabulary. However, once he esimaor is consruced, he correcness of daabase selecion is fixed a a cerain level and canno be explicily conrolled by he user. In our dynamic probing approach, he user explicily specifies he desired level of correcness, regardless of wha esimaor we use. Our resuls reveal ha using he esimaor developed in bgloss [4], he answer s correcness is grealy improved via a small amoun of probing. References [] M.K. Bergman. The Deep Web: Surfacing Hidden Value. Accessible a DeepWeb, 000 [] C. Baumgaren. A Probabilisic Soluion o he Selecion and Fusion Problem in Disribued Informaion Rerieval. In Proc. of ACM SIGIR 99, CA, 999 [] J.A. Borges, I. Morales, N.J. Rodrguez. Guidelines for Designing sable World Wide Web Pages. In Proc. of ACM SIGCHI 96, hp://sigchi.org/chi96/, 996 [4] Craswell, P. Bailey, and D. Hawing. Server Selecion on he World Wide Web. In Proc. of ACM Conf. Digial Library 00, TX, 000 [5] J. P. Callan, Z. Lu, and W. Crof. Searching Disribued Collecions wih Inference Newors. In Proc. of ACM SIGIR 95, WA, 995 [6] J. Callan, M. Connell, and A. Du. Auomaic Discovery of Language Models for Tex Daabases. In Proc. of ACM SIGMOD 99, PA, 999 [7] S. Chaudhuri and L. Gravano. Opimizing Queries over Mulimedia Reposiories. In Proc. of ACM SIGMOD 96, Canada, 996 [8] S. Chaudhuri and L. Gravano. Evaluaing Top- Selecion Queries. In Proc. of VLDB 99, Scoland, 999 [9] K. Chang and S.Hwang. Minimal Probing: Supporing Expensive Predicaes for Top- Queries. In Proc. of ACM SIGMOD 0, WI, 00 [0] A. Clyde. The Invisible Web. Teacher Librarian 9(4), hp:// 00 [] R. Fagin. Combining fuzzy informaion from muliple sysems. In Proc. of ACM PODS 96, Canada, 996 [] H. Garcia-Molina, Y. Papaonsaninou, D. Quass, e al. The TSIM- MIS Projec: Inegraion of Heerogeneous Informaion Sources. J. Inelligen Informaion Sysem 8():7-, 997 [] M.R. Garey, D.S. Johnson. Compuers and Inracabiliy: A Guide o he Theory of NP-Compleeness, W.H. Freeman, 990 [4] L. Gravano, H. Garcia-Molina, A. Tomasic. The Effeciveness of GlOSS for he Tex Daabase Discovery Problem. In Proc. of SIG- MOD 94, MN, 994 [5] L. Gravano, and H. Garcia-Molina. Generalizing GlOSS o Vecor- Space daabases and Broer Hierarchies. In Proc. of VLDB 95, Swizerland, 995 [6] L. Gravano, H. Garcia-Molina, A. Tomasic. GlOSS: Tex-Source Discovery over he Inerne. ACM TODS 4():9-64, 999 [7] A.Y. Halevy. Answering Queries sing Views: A Survey. VLDB Journal 0(4):70-94, 00. [8] P.G. Ipeirois, L. Gravano. Disribued Search over he Hidden Web: Hierarchical Daabase Sampling and Selecion. In Proc of VLDB 0, China, 00 [9] S. Kirsch. The Fuure of Inerne Search: Infosee s Experiences Searching he Inernec. ACM SIGIR Forum ():-7, 998 [0] V.Z. Liu, R.C. Luo, J. Cho, W.W. Chu. A Probabilisic Framewor for Hidden Web Daabase Selecion sing Dynamic Probing. Technical repor, CLA, Compuer Science Deparmen, 00

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Echocardiography Project and Finite Fourier Series

Echocardiography Project and Finite Fourier Series Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H. ACE 564 Spring 2006 Lecure 7 Exensions of The Muliple Regression Model: Dumm Independen Variables b Professor Sco H. Irwin Readings: Griffihs, Hill and Judge. "Dumm Variables and Varing Coefficien Models

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1. Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j = 1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of

More information

Predator - Prey Model Trajectories and the nonlinear conservation law

Predator - Prey Model Trajectories and the nonlinear conservation law Predaor - Prey Model Trajecories and he nonlinear conservaion law James K. Peerson Deparmen of Biological Sciences and Deparmen of Mahemaical Sciences Clemson Universiy Ocober 28, 213 Ouline Drawing Trajecories

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes Half-Range Series 2.5 Inroducion In his Secion we address he following problem: Can we find a Fourier series expansion of a funcion defined over a finie inerval? Of course we recognise ha such a funcion

More information

Chapter 7: Solving Trig Equations

Chapter 7: Solving Trig Equations Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

On Multicomponent System Reliability with Microshocks - Microdamages Type of Components Interaction

On Multicomponent System Reliability with Microshocks - Microdamages Type of Components Interaction On Mulicomponen Sysem Reliabiliy wih Microshocks - Microdamages Type of Componens Ineracion Jerzy K. Filus, and Lidia Z. Filus Absrac Consider a wo componen parallel sysem. The defined new sochasic dependences

More information

5.2. The Natural Logarithm. Solution

5.2. The Natural Logarithm. Solution 5.2 The Naural Logarihm The number e is an irraional number, similar in naure o π. Is non-erminaing, non-repeaing value is e 2.718 281 828 59. Like π, e also occurs frequenly in naural phenomena. In fac,

More information

Solutions to Odd Number Exercises in Chapter 6

Solutions to Odd Number Exercises in Chapter 6 1 Soluions o Odd Number Exercises in 6.1 R y eˆ 1.7151 y 6.3 From eˆ ( T K) ˆ R 1 1 SST SST SST (1 R ) 55.36(1.7911) we have, ˆ 6.414 T K ( ) 6.5 y ye ye y e 1 1 Consider he erms e and xe b b x e y e b

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Lab 10: RC, RL, and RLC Circuits

Lab 10: RC, RL, and RLC Circuits Lab 10: RC, RL, and RLC Circuis In his experimen, we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors. We will sudy he way volages and currens change in

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015 Explaining Toal Facor Produciviy Ulrich Kohli Universiy of Geneva December 2015 Needed: A Theory of Toal Facor Produciviy Edward C. Presco (1998) 2 1. Inroducion Toal Facor Produciviy (TFP) has become

More information

RC, RL and RLC circuits

RC, RL and RLC circuits Name Dae Time o Complee h m Parner Course/ Secion / Grade RC, RL and RLC circuis Inroducion In his experimen we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors.

More information

NCSS Statistical Software. , contains a periodic (cyclic) component. A natural model of the periodic component would be

NCSS Statistical Software. , contains a periodic (cyclic) component. A natural model of the periodic component would be NCSS Saisical Sofware Chaper 468 Specral Analysis Inroducion This program calculaes and displays he periodogram and specrum of a ime series. This is someimes nown as harmonic analysis or he frequency approach

More information

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011 Mainenance Models Prof Rober C Leachman IEOR 3, Mehods of Manufacuring Improvemen Spring, Inroducion The mainenance of complex equipmen ofen accouns for a large porion of he coss associaed wih ha equipmen

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

5.1 - Logarithms and Their Properties

5.1 - Logarithms and Their Properties Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We

More information

Math 333 Problem Set #2 Solution 14 February 2003

Math 333 Problem Set #2 Solution 14 February 2003 Mah 333 Problem Se #2 Soluion 14 February 2003 A1. Solve he iniial value problem dy dx = x2 + e 3x ; 2y 4 y(0) = 1. Soluion: This is separable; we wrie 2y 4 dy = x 2 + e x dx and inegrae o ge The iniial

More information

Phys1112: DC and RC circuits

Phys1112: DC and RC circuits Name: Group Members: Dae: TA s Name: Phys1112: DC and RC circuis Objecives: 1. To undersand curren and volage characerisics of a DC RC discharging circui. 2. To undersand he effec of he RC ime consan.

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems Single-Pass-Based Heurisic Algorihms for Group Flexible Flow-shop Scheduling Problems PEI-YING HUANG, TZUNG-PEI HONG 2 and CHENG-YAN KAO, 3 Deparmen of Compuer Science and Informaion Engineering Naional

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Solutions for Assignment 2

Solutions for Assignment 2 Faculy of rs and Science Universiy of Torono CSC 358 - Inroducion o Compuer Neworks, Winer 218 Soluions for ssignmen 2 Quesion 1 (2 Poins): Go-ack n RQ In his quesion, we review how Go-ack n RQ can be

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

STA 114: Statistics. Notes 2. Statistical Models and the Likelihood Function

STA 114: Statistics. Notes 2. Statistical Models and the Likelihood Function STA 114: Saisics Noes 2. Saisical Models and he Likelihood Funcion Describing Daa & Saisical Models A physicis has a heory ha makes a precise predicion of wha s o be observed in daa. If he daa doesn mach

More information

1 Differential Equation Investigations using Customizable

1 Differential Equation Investigations using Customizable Differenial Equaion Invesigaions using Cusomizable Mahles Rober Decker The Universiy of Harford Absrac. The auhor has developed some plaform independen, freely available, ineracive programs (mahles) for

More information

Generalized Least Squares

Generalized Least Squares Generalized Leas Squares Augus 006 1 Modified Model Original assumpions: 1 Specificaion: y = Xβ + ε (1) Eε =0 3 EX 0 ε =0 4 Eεε 0 = σ I In his secion, we consider relaxing assumpion (4) Insead, assume

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

SOLUTIONS TO ECE 3084

SOLUTIONS TO ECE 3084 SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no

More information

BBP-type formulas, in general bases, for arctangents of real numbers

BBP-type formulas, in general bases, for arctangents of real numbers Noes on Number Theory and Discree Mahemaics Vol. 19, 13, No. 3, 33 54 BBP-ype formulas, in general bases, for arcangens of real numbers Kunle Adegoke 1 and Olawanle Layeni 2 1 Deparmen of Physics, Obafemi

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

Errata (1 st Edition)

Errata (1 st Edition) P Sandborn, os Analysis of Elecronic Sysems, s Ediion, orld Scienific, Singapore, 03 Erraa ( s Ediion) S K 05D Page 8 Equaion (7) should be, E 05D E Nu e S K he L appearing in he equaion in he book does

More information

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence MATH 433/533, Fourier Analysis Secion 6, Proof of Fourier s Theorem for Poinwise Convergence Firs, some commens abou inegraing periodic funcions. If g is a periodic funcion, g(x + ) g(x) for all real x,

More information