Article Sequential Change-Point Detection via Online Convex Optimization. and Huan Xu

Size: px
Start display at page:

Download "Article Sequential Change-Point Detection via Online Convex Optimization. and Huan Xu"

Transcription

1 enropy Aricle Sequenial Change-Poin Deecion via Online Convex Opimizaion Yang Cao, Liyan Xie, Yao Xie * ID and Huan Xu H. Milon Sewar School of Indusrial and Sysems Engineering, Georgia Insiue of Technology, Alana, GA 30332, USA; caoyang@gaech.edu (Y.C.); lxie49@gaech.edu (L.X.); huan.xu@isye.gaech.edu (H.X.) * Correspondence: yao.xie@isye.gaech.edu Received: Sepember 207; Acceped: 5 February 208; Published: 7 February 208 Absrac: Sequenial change-poin deecion when he disribuion parameers are unknown is a fundamenal problem in saisics and machine learning. When he pos-change parameers are unknown, we consider a se of deecion procedures based on sequenial likelihood raios wih non-anicipaing esimaors consruced using online convex opimizaion algorihms such as online mirror descen, which provides a more versaile approach o ackling complex siuaions where recursive maximum likelihood esimaors canno be found. When he underlying disribuions belong o a exponenial family and he esimaors saisfy he logarihm regre propery, we show ha his approach is nearly second-order asympoically opimal. This means ha he upper bound for he false alarm rae of he algorihm (measured by he average-run-lengh) mees he lower bound asympoically up o a log-log facor when he hreshold ends o infiniy. Our proof is achieved by making a connecion beween sequenial change-poin and online convex opimizaion and leveraging he logarihmic regre bound propery of online mirror descen algorihm. Numerical and real daa examples validae our heory. Keywords: sequenial mehods; change-poin deecion; online algorihms. Inroducion Sequenial analysis is a classic opic in saisics concerning online inference from a sequence of observaions. The goal is o make saisical inference as quickly as possible, while conrolling he false-alarm rae. An imporan sequenial analysis problem commonly sudied is sequenial change-poin deecion []. I arises from various applicaions including online anomaly deecion, saisical qualiy conrol, biosurveillance, financial arbirage deecion and nework securiy monioring (see, e.g., [2 4]). We are ineresed in he sequenial change-poin deecion problem wih known pre-change parameers bu unknown pos-change parameers. Specifically, given a sequence of samples X, X 2,..., we assume ha hey are independen and idenically disribued (i.i.d.) wih cerain disribuion f q parameerized by q, and he values of q are differen before and afer some unknown ime called he change-poin. We furher assume ha he parameers before he change-poin are known. This is reasonable since usually i is relaively easy o obain he reference daa for he normal sae, so ha he parameers in he normal sae can be esimaed wih good accuracy. Afer he change-poin, however, he values of he parameers swich o some unknown values, which represen anomalies or novelies ha need o be discovered... Moivaion: Dilemma of CUSUM and Generalized Likelihood Raio (GLR) Saisics Consider change-poin deecion wih unknown pos-change parameers. A commonly used change-poin deecion mehod is he so-called CUSUM procedure [4] ha can be derived from Enropy 208, 20, 08; doi:0.3390/e

2 Enropy 208, 20, 08 2 of 25 likelihood raios. Assume ha before he change, he samples X i follow a disribuion f q0 and afer he change he samples X i follow anoher disribuion f q. CUSUM procedure has a recursive srucure: iniialized wih W 0 = 0, he likelihood-raio saisic can be compued according o W + = max{w + log( f q (X + )/ f q0 (X + )),0}, and a change-poin is deeced whenever W exceeds a pre-specified hreshold. Due o he recursive srucure, CUSUM is memory and compuaion efficien since i does no need o sore he hisorical daa and only needs o record he value of W. The performance of CUSUM depends on he choice of he pos-change parameer q ; in paricular, here mus be a well-defined noion of disance beween q 0 and q. However, he choice of q is somewha subjecive. Even if in pracice a reasonable choice of q is he smalles change-of-ineres, in he muli-dimensional seing, i is hard o define wha he smalles change would mean. Moreover, when he assumed parameer q deviaes significanly from he rue parameer value, CUSUM may suffer a severe performance degradaion [5]. An alernaive approach is he Generalized Likelihood Raio (GLR) saisic based procedure [6]. The GLR saisic finds he maximum likelihood esimae (MLE) of he pos-change parameer and plugs i back ino he likelihood raio o form he deecion saisic. To be more precise, for each hypoheical change-poin locaion k, he corresponding pos-change samples are {X k+,..., X }. Using hese samples, one can form he MLE denoed as ˆq k+,. Wihou knowing wheher he change occurs and where i occurs beforehand when forming he GLR saisic, we have o maximize k over all possible change locaions. The GLR saisic is given by max k< i=k+ log( f ˆq (X k, i )/ f q0 (X )), and a change is announced whenever i exceeds a pre-specified hreshold. The GLR saisic is more robus han CUSUM [7], and i is paricularly useful when he pos-change parameer may vary from one siuaion o anoher. In simple cases, he MLE ˆq k+, may have closed-form expressions and may be evaluaed recursively. For insance, when he pos-change disribuion is Gaussian wih mean q [8], ˆq k+, =(i=k+ X i)/( k), and ˆq k+,+ =( k)/( k + ) ˆq k+, + X + /( k + ). However, in more complex siuaions, in general MLE ˆq k+, does no have recursive form and canno be evaluaed using simple summary saisics. One such insance is given in Secion.2. Anoher insance is when here is a consrain on he MLE such as sparsiy. In hese cases, one has o sore hisorical daa and recompue he MLE ˆq k, whenever here is new daa, which is no memory efficien nor compuaional efficien. For hese cases, as a remedy, he window-limied GLR is usually considered, where only he pas w samples are sored and he maximizaion is resriced o be over k 2 ( w, ]. However, even wih he window-limied GLR, one sill has o recompue ˆq k, using hisorical daa whenever he new daa are added. Besides CUSUM or GLR, various online change-poin deecion procedures using one-sample updaes have been considered, which replace wih he MLE wih a simple recursive esimaor. The one-sample updae esimae akes he form of ˆq k, = h(x, ˆq k, ) for some funcion h ha uses only he mos recen daa and he previous esimae. Then he esimaes are plugged ino he likelihood raio saisic o perform deecion. Online convex opimizaion algorihms (such as online mirror descen) are naural approach o consruc hese esimaors (see, e.g., [9,0]). Such a scheme provides a more versaile approach o developing a deecing procedure for complex siuaions, where he exac MLE does no have a recursive form or even a closed-form expression. The one-sample updae enjoys efficien compuaion, as informaion from he new daa can be incorporaed via low compuaional cos updae. I is also memory efficien since he updae only needs he mos recen sample. The one sample updae esimaors may no correspond o he exac MLE, bu hey end o resul in good deecion performance. However, in general here is no performance guaranees for such an approach. This is he quesion we aim o address in his paper..2. Applicaion Scenario: Social Nework Change-Poin Deecion The widespread use of social neworks (such as Twier) leads o a large amoun of user-generaed daa generaed coninuously. One imporan aspec is o deec change-poins in sreaming social nework daa. These change-poins may represen he collecive anicipaion of response o exernal

3 Enropy 208, 20, 08 3 of 25 evens or sysem shocks []. Deecing such changes can provide a beer undersanding of paerns of social life. In social neworks, a common form of he daa is discree evens over coninuous ime. As a simplificaion, each even conains a ime label and a user label in he nework. In our prior work [2], we model discree evens using nework poin processes, which capure he influence beween users hrough an influence marix. We hen cas he problem as deecing changes in an influence marix, assuming ha he influence marix in he normal sae (before he change) can be esimaed from he reference daa. Afer he change, he influence marix is unknown (since i represens an anomaly) and has o be esimaed online. Due o compuaional burden and memory consrain, since he scale of he nework ends o be large, we do no wan o sore he enire hisorical daa and raher compue he saisic in real-ime. A simulaed example o illusrae his case is shown laer in Secion Conribuions This paper has wo main conribuions. Firs, we presen a general approach based on online convex opimizaion (OCO) for consrucing he esimaor for he one-sided sequenial hypohesis es and he sequenial change-poin deecion, in he non-anicipaive approach of [8] if he MLE canno be compued in a convenien recursive form. Second, we provide a proof of he near second-order asympoic opimaliy of his approach when a logarihmic regre propery is saisfied and when he disribuions are from an exponenial family. The nearly second-order asympoic opimaliy [4] means ha he upper bound for performance maches he lower bound up o a log-log facor as he false-alarm rae ends o zero. Inspired by he exising connecion beween sequenial analysis and online convex opimizaion in [3,4], we prove he near opimaliy leveraging he logarihmic regre propery of online mirror descen (OMD) and he lower bound esablished in saisical sequenial change-poin lieraure [4,5]. More precisely, we provide a general upper bound for he one-sided sequenial hypohesis es and change-poin deecion procedures wih he one-sample updae schemes. The upper bound explicily capures he impac of esimaion on deecion by an esimaion algorihm dependen facor. This facor shows up as an addiional erm in he upper bound for he expeced deecion delay, and i corresponds o he regre incurred by he one-sample updae esimaors. This esablishes an ineresing linkage beween sequenial change-poin deecion and online convex opimizaion. Alhough boh fields, sequenial change-poin deecion and online convex opimizaion, sudy sequenial daa, he precise connecion beween hem is no clear, parly because he performance merics are differen: he former is concerned wih he radeoff beween average run lengh and deecion delay, whereas he laer focuses on bounding he cumulaive loss incurred by he sequence of esimaors hrough a regre bound [4,6]. Synheic examples validae he performances of one sample updae schemes. Here we focus on OMD esimaors, bu he resuls can be generalized o oher OCO schemes such as he online gradien descen..4. Lieraure and Relaed Work Sequenial change-poin deecion is a classic subjec wih an exensive lieraure. Much success has been achieved when he pre-change and pos-change disribuions are exacly specified. For example, he CUSUM procedure [7] wih firs-order asympoic opimaliy [8] and exac opimaliy [9] in he minimax sense, and he Shiryayev-Robers (SR) procedure [20] derived based on Bayesian principle ha also enjoys various opimaliy properies. Boh CUSUM and SR procedures rely on likelihood raios beween he specified pre-change and pos-change disribuions. There are wo main approaches in dealing wih he unknown pos-change parameers. The firs one is a GLR approach [7,2 24], and he second is a mixure approach [5,25]. The GLR saisic enjoys cerain opimaliy properies, bu i can no be compued recursively in many cases [23]. To address he infinie memory issue, [7,2] sudied he window-limied GLR procedure. The main advanage of he mixure approach is ha i allows an easy evaluaion of a hreshold ha guaranees he desired

4 Enropy 208, 20, 08 4 of 25 false alarm consrain. A disadvanage of his approach is ha someimes here may no be a naural way of selecing he weigh funcion, in paricular when here is no conjugae prior. This moivaed a hird approach o his problem, which was proposed firs by Robbins and Siegmund in he conex of hypohesis esing, and hen Lorden and Pollak [8] in he sequenial change deecion problem. This approach replaces he unknown parameer wih some non-anicipaing esimaor, which can be easier o find even if here is no conjugae prior, as in he Gamma example considered in [8,25]. This work developed a modified SR procedure by inroducing a prior disribuion o he unknown parameers. While he non-anicipaing esimaor approach [8,24] enjoys recursive and hus efficien compuaion for he likelihood raio based deecion saisics, heir approaches o consrucing recursive esimaors (based on MLE or mehod-of-momens) canno be easily exended o more complex cases (for insance, muli-dimensional parameers wih consrains). Here, we consider a general and convenien approach for consrucing non-anicipaing esimaors based on online convex opimizaion which is paricularly useful for hese complex cases. Our work provides an alernaive proof for he nearly second-order asympoic opimaliy by building a connecion o online convex opimizaion and leveraging he regre bound ype of resuls [4]. For one-dimensional Gaussian mean shif wihou any consrain, we replicae he second-order asympoic opimaliy, namely, Theorem 3.3 in [24]. Recen work [26] also reas he problem when he pre-change disribuion has unknown parameers. Anoher relaed problem is sequenial join esimaion and deecion, bu he goal is differen in ha one aims o achieve boh good deecion and good esimaion performance, whereas in our seing, esimaion is only needed for compuing he deecion saisics. These works include [27] and [28], which sudy he join deecion and esimaion problem of a specific form ha arises from many applicaions such as specrum sensing [29], image observaions [30], and MIMO radar [3]: a linear scalar observaion model wih Gaussian noise, and under he alernaive hypohesis here is an unknown muliplicaive parameer. The paper of [27] demonsraes ha solving he join problem by reaing deecion and esimaion separaely wih he corresponding opimal procedure does no yield an overall opimum performance, and provides an elegan closed-form opimal deecor. Laer on [28] generalizes he resuls. There are also oher approaches solving he join deecion-esimaion problem using muliple hypoheses esing [30,32] and Bayesian formulaions [33]. Relaed work using online convex opimizaion for anomaly deecion includes [9], which develops an efficien deecor for he exponenial family using online mirror descen and proves a logarihmic regre bound, and [0], which dynamically adjuss he deecion hreshold o allow feedbacks abou wheher decision oucome. However, hese works consider a differen seing ha he change is a ransien oulier insead of a persisen change, as assumed by he classic saisical change-poin deecion lieraure. When here is persisen change, i is imporan o accumulae evidence by pooling he pos-change samples (our work considers he persisen change). Exensive work has been done for parameer esimaion in he online-seing. This includes online densiy esimaion over he exponenial family by regre minimizaion [9,0,6], sequenial predicion of individual sequence wih he logarihm loss [3,34], online predicion for ime series [35], and sequenial NML (SNML) predicion [34] which achieves he opimal regre bound. Our problem is differen from he above, in ha esimaion is no he end goal; one only performs parameer esimaion o plug hem back ino he likelihood funcion for deecion. Moreover, a suble bu imporan difference of our work is ha he loss funcion for online deecing esimaion is f ˆq (X i i ), whereas our loss funcion is f ˆq (X i i ) in order o reain he maringale propery, which is essenial o esablish he nearly second-order asympoic opimaliy. 2. Preliminaries Assume a sequence of i.i.d. random variables X, X 2,...wih a probabiliy densiy funcion of a parameric form f q. The parameer q may be unknown. Consider wo relaed problems: one-sided sequenial hypohesis es and sequenial change-poin deecion. The deecion saisic relies on a sequence esimaor { ˆq } consruced using online mirror descen. The OMD uses simple one-sample

5 Enropy 208, 20, 08 5 of 25 updae: he updae from ˆq o ˆq only uses he curren sample X. This is he main difference from he radiional generalized likelihood raio (GLR) saisic [7], where each ˆq is esimaed using hisorical samples. In he following, we presen deailed descripions for wo problems. We will consider exponenial family disribuions and presen our non-anicipaing esimaor based on he one-sample esimae. 2.. One-Sided Sequenial Hypohesis Tes Firs, we consider a one-sided sequenial hypohesis es where he goal is only o rejec he null hypohesis. This is a special case of he change-deecion problem where he change-poin can be eiher 0 or (meaning i never occurs). Sudying his special case will given us an imporan inermediae sep owards solving he sequenial change-deecion problem. Consider he null hypohesis H 0 : q = q 0 versus he alernaive H : q 6= q 0. Hence, he parameer under he alernaive disribuion is unknown. The classic approach o solve his problem is he one-sided sequenial probabliy-raio es (SPRT) [36]: a each ime, given samples {X, X 2,..., X }, he decision is eiher o rejec H 0 or aking more samples if he rejecion decision can no be made confidenly. Here, we inroduce a modified one-sided SPRT wih a sequence of non-anicipaing plug-in esimaors: ˆq := ˆq (X,...,X ), =, 2,.... () Define he es saisic a ime as L = The es saisic has a simple recursive implemenaion: f ˆq (X i i ), i. (2) f q0 (X i ) L = L f ˆq (X ) f q0 (X. Define a sequence of s-algebras {F } where F = s(x,..., X ). The es saisic has he maringale propery due o is non-anicipaing naure: E[L F ]=L, where he expecaion is aken when X,... are i.i.d. random variables drawn from f q0. The decision rule is a sopping ime (b) =min{ : log L b}, (3) where b > 0 is a pre-specified hreshold. We rejec he null hypohesis whenever he saisic exceeds he hreshold. The goal is o rejec he null hypohesis using as few samples as possible under he false-alarm rae (or Type-I error) consrain Sequenial Change-Poin Deecion Now we consider he sequenial change-poin deecion problem. A change may occur a an unknown ime n which alers he underlying disribuion of he daa. One would like o deec such a change as quickly as possible. Formally, change-poin deecion can be cas ino he following hypohesis es: H 0 : X, X 2,... i.i.d. f q0, H : X,...,X n i.i.d. f q0, X n+, X n+2,... i.i.d. f q, (4) Here we assume an unknown q o represen he anomaly. The goal is o deec he change as quickly as possible afer i occurs under he false-alarm rae consrain. We will consider likelihood raio based

6 Enropy 208, 20, 08 6 of 25 deecion procedures adaped from wo ypes of exising ones, which we call he adapive CUSUM (ACM), and he adapive SRRS (ASR) procedures. For change-poin deecion, he pos-change parameer is esimaed using pos-change samples. This means ha, for each puaive change-poin locaion before he curren ime k <, he pos-change samples are {X k,...,x }; wih a sligh abuse of noaion, he pos-change parameer is esimaed as ˆq k,i = ˆq k,i (X k,...,x i ), i k. (5) Therefore, for k =, ˆq k,i becomes ˆq i defined in (2) for he one-sided SPRT. Iniialize wih ˆq k,k = q 0. The likelihood raio a ime for a hypoheical change-poin locaion k is given by L k, = f ˆq (X k,i i ) f i=k q0 (X i ), (6) where L k, can be compued recursively similar o (2). Since we do no know he change-poin locaion n, from he maximum likelihood principle, we ake he maximum of he saisics over all possible values of k. This gives he ACM procedure: T ACM (b )=inf : max applekapple log L k, > b, (7) where b is a pre-specified hreshold. Similarly, by replacing he maximizaion over k in (7) wih summaion, we obain he following ASR procedure [8], which can be inerpreed as a Bayesian saisic similar o he Shiryaev-Robers procedure. T ASR (b 2 )=inf ( : log! ) L k, > b 2, (8) k= where b 2 is a pre-specified hreshold. The compuaions of L k, and esimaor { ˆq }, { ˆq k, } are discussed laer in Secion 2.4. For a fixed k, he comparison beween our mehods and GLR is illusraed in Figure. Remark. In pracice, o preven he memory and compuaion complexiy from blowing up as ime goes o infiniy, we can use window-limied version of he deecion procedures in (7) and (8). The window-limied versions are obained by replacing max applekapple wih max wapplekapple in (7) and by replacing k= wih k= w in (8). Here w is a prescribed window size. Even if we do no provide heoreical analysis o he window-limied versions, we refer he readers o [7] for he choice of w he window-limied GLR procedures. GLR! ",,! %! %&' One-sample updae () ",%! %&' Compue MLE: () ",%&' GLR: Λ ",%&' =. /0 (6 7 ) %&',234 :;". / 9 (6 7) () ",%&' ACM: Λ ",%&' = Λ ",%. / 0,2 (6 234 ). / 9 (6 234) Figure. Comparison of he updae scheme for GLR and our mehods when a new sample arrives.

7 Enropy 208, 20, 08 7 of Exponenial Family In his paper, we focus on f q being he exponenial family for he following reasons: (i) exponenial family [0] represens a very rich class of parameric and even many nonparameric saisical models [37]; (ii) he negaive log-likelihood funcion for exponenial family log f q (x) is convex, and his allows us o perform online convex opimizaion. Some useful properies of he exponenial family are briefly summarized below, and full proofs can be found in [0,38]. Consider an observaion space X equipped wih a sigma algebra B and a sigma finie measure H on (X, B). Assume he number of parameers is d. Le x denoe he ranspose of a vecor or marix. Le f : X! R d be an H-measurable funcion f(x) =(f (x),..., f d (x)). Here f(x) corresponds o he sufficien saisic for q. Le Q denoe he parameer space in R d. Le {P q, q 2 Q} be a se of probabiliy disribuions wih respec o he measure H. Then, {P q, q 2 Q} is said o be a mulivariae exponenial family wih naural parameer q, if he probabiliy densiy funcion of each f q 2P q wih respec o H can be expressed as f q (x) =exp{q f(x) F(q)}. In he definiion, he so-called log-pariion funcion is given by Z F(q) := log X exp(q f(x))dh(x). To make sure f q (x) a well-defined probabiliy densiy, we consider he following wo ses for parameers: Z Q = {q 2 R d : log X exp(q f(x))dh(x) < + }, and Q s = {q 2 Q : r 2 F(q) si d d }. Noe ha log f q (x) is s-srongly convex over Q s. Is gradien corresponds o rf(q) =E q [f(x)], and he Hessian r 2 F(q) corresponds o he covariance marix of he vecor f(x). Therefore, r 2 F(q) is posiive semidefinie and F(q) is convex. Moreover, F is a Legendre funcion, which means ha i is srongly convex, coninuous differeniable and essenially smooh [38]. The Legendre-Fenchel dual F is defined as F (z) =sup{u z F(u)}. u2q The mappings rf is an inverse mapping of rf [39]. Moreover, if F is a srongly convex funcion, hen rf =(rf). A general measure of proximiy used in he OMD is he so-called Bregman divergence B F, which is a nonnegaive funcion induced by a Legendre funcion F (see, e.g., [0,38]) defined as B F (u, v) := F(u) F(v) hrf(v), u vi. (9) For exponenial family, a naural choice of he Bregman divergence is he Kullback-Leibler (KL) divergence. Define E q as he expecaion when X is a random variable wih densiy f q and I(q, q 2 ) as he KL divergence beween wo disribuions wih densiies f q and f q2 for any q, q 2 2 Q. Then I(q, q 2 )=E q log( fq (X)/ f q2 (X)). (0) I can be shown ha, for exponenial family, I(q, q 2 )=F(q 2 ) F(q ) (q 2 q ) rf(q ). Using he definiion (9), his means ha B F B F (q, q 2 ) := I(q 2, q ) () is a Bregman divergence. This propery is useful o consrucing mirror descen esimaor for he exponenial family [39,40].

8 Enropy 208, 20, 08 8 of Online Convex Opimizaion (OCO) Algorihms for Non-Anicipaing Esimaors Online convex opimizaion (OCO) algorihms [4] can be inerpreed as a player who makes sequenial decisions. A he ime of each decision, he oucomes are unknown o he player. Afer commiing o a decision, he decision maker suffers a loss ha can be adversarially chosen. An OCO algorihm makes decisions, which, based on he observed oucomes, minimizes he regre ha is he difference beween he oal loss ha has incurred relaive o ha of he bes fixed decision in hindsigh. To design non-anicipaing esimaors, we consider OCO algorihms wih likelihood-based regre funcions. We ieraively esimae he parameers a he ime when one new observaion becomes available based on he maximum likelihood principle, and hence he loss incurred corresponds o he negaive log-likelihood of he new sample evaluaed a he esimaor `(q) := log f q (X ), which corresponds o he log-loss in [3]. Given samples X,..., X, he regre for a sequence of esimaors { ˆq i } generaed by a likelihood-based OCO algorihm a is defined as R a = { log f ˆq i (X i )} inf q2q { log f q (X i)}. (2) Below we omi he superscrip a occasionally for noaional simpliciy. In his paper, we consider a generic OCO procedure called he online mirror descen algorihms (OMD) [4,4]. Nex, we discuss how o consruc he non-anicipaing esimaors { ˆq } in (), and { ˆq k, }, k =, 2,..., in (5) using OMD. The main idea of OMD is he following. A each ime sep, he esimaor ˆq is updaed using he new sample X, by balancing he endency o say close o he previous esimae agains he endency o move in he direcion of he greaes local decrease of he loss funcion. For he loss funcion defined above, a sequence of OMD esimaor is consruced by ˆq = arg min[u r`( ˆq )+ B F (u, ˆq u2g h )], (3) i where B F is defined in (). Here G Q s is a closed convex se, which is problem-specific and encourages cerain parameer srucure such as sparsiy. Remark 2. Similar o (3), for any fixed k, we can compue { ˆq k, } via OMD for sequenial change-poin deecion. The only difference is ha { ˆq k, } is compued if we use X k as our firs sample and hen apply he recursive updae (3) on X k+,... For ˆq, we use X as our firs sample. There is an equivalen form of OMD, presened as he original formulaion in [40]. The equivalen form is someimes easier o use for algorihm developmen, and i consiss of four seps: () compue he dual variable: ˆµ = rf( ˆq ); (2) perform he dual updae: ˆµ = ˆµ h r`( ˆq ); (3) compue he primal variable: q = (rf) ( ˆµ ); (4) perform he projeced primal updae: ˆq = arg min u2g B F (u, q ). The equivalence beween he above form for OMD and he nonlinear projeced subgradien approach in (3) is proved in [39]. We adop his approach when deriving our algorihm and follow he same sraegy as [9]. Algorihm summarizes he seps [42]. For srongly convex loss funcion, he regre of many OCO algorihms, including he OMD, has he propery ha R n apple C log n for some consan C (depend on f q and Q s ) and any posiive ineger n [0,43]. Noe ha for exponenial family, he loss funcion is he negaive log-likelihood funcion, which is srongly convex over Q s. Hence, we can have he logarihmic regre propery.

9 Enropy 208, 20, 08 9 of 25 Algorihm Online mirror-descen for non-anicipaing esimaors. Require: Exponenial family specificaions f(x), F(x) and f q (x); iniial parameer value q 0 ; sequence of daa X,..., X,...; a closed, convex se for parameer G Q s ; a decreasing sequence {h } of sricly posiive sep-sizes. : ˆq 0 = q 0, L 0 =. {Iniializaion} 2: for all =, 2,..., do 3: Acquire a new observaion X 4: Compue loss `( ˆq ), log f ˆq (X )=F( ˆq ) ˆq f(x ) 5: Compue likelihood raio L = L f ˆq (X )/ f q0 (X ) 6: ˆµ = rf( ˆq ), ˆµ = ˆµ h ( ˆµ f(x )) {Dual updae} 7: q =(rf) ( ˆµ ) 8: ˆq = arg min u2g B F (u, q ) {Projeced primal updae} 9: end for 0: reurn { ˆq } and {L }. 3. Nearly Second-Order Asympoic Opimaliy of One-Sample Updae Schemes Below we prove he nearly second-order asympoic opimaliy of he one-sample updae schemes. More precisely, he nearly second-order asympoic opimaliy means ha he algorihm obains he lower performance bound asympoically up o a log-log facor in he false-alarm rae, as he false-alarm rae ends o zero (in many cases he log-log facor is a small number). We firs inroduce some necessary noaions. Denoe P q,n and E q,n as he probabiliy measure and he expecaion when he change occurs a ime n and he pos-change parameer is q, i.e., when X,..., X n are i.i.d. random variables wih densiy f q0 and X n+, X n+2,...are i.i.d. random variables wih densiy f q. Moreover, le P and E denoe he probabiliy measure when here is no change, i.e., X, X 2,...are i.i.d. random variables wih densiy f q0. Finally, le F denoe he s-algebra generaed by X,...,X for. 3.. One-Sided Sequenial Hypohesis Tes Recall ha he decision rule for sequenial hypohesis es is a sopping ime (b) defined in (3). The wo sandard performance merics are he false-alarm rae, denoed as P ((b) < ), and he expeced deecion delay (i.e., he expeced number of samples needed o rejec he null), denoed as E q,0 [(b)]. A meaningful es should have boh small P ((b) < ) and small E q,0 [(b)]. Usually, one adjuss he hreshold b o conrol he false-alarm rae o be below a cerain level. Our main resul is he following. As has been observed by [23], here is a loss in he saisical efficiency by using one-sample updae esimaors relaive o he GLR approach using he enire samples X,..., X in he pas. The heorem below shows ha his loss corresponds o he expeced regre given in (2). Theorem (Upper bound for OCO based SPRT). Le { ˆq } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. As b!, h i b E q,0 [(b)] apple I(q, q 0 ) + E q,0 R a (b) + O() (4) I(q, q 0 ) Here O() is a erm upper-bounded by an absolue consan as b!. The main idea of he proof is o decompose he saisic defining (b), log L(), ino a few erms ha form maringales, and hen invoke he Wald s Theorem for he sopped process.

10 Enropy 208, 20, 08 0 of 25 Remark 3. The inequaliy (4) is valid for a sequence of non-anicipaing esimaors generaed by an OCO algorihm. Moreover, (4) gives an explici connecion beween he expeced deecion delay for he one-sided sequenial hypohesis esing (lef-hand side of (4)) and he regre for he OCO (he second erm on he righ-hand side of (4)). This illusraes clearly he impac of esimaion on deecion by an esimaion algorihm dependen facor. Noe ha in he saemen of he Theorem, he sopping ime (b) appears on he righ-hand side of he inequaliy (4). For OMD, he expeced sample size is usually small. By comparing wih specific regre bound R (b), we can bound E q,0 [(b)] as discussed in Secion 4. The mos imporan case is ha when he esimaion algorihm has a logarihmic expeced regre. For he exponenial family, as shown in Secion 3.3, Algorihm can achieve E q,0 [R n ] apple C log n for any posiive ineger n. To obain a more specific order of he upper bound for E q,0 [ b ] when b grows, we esablish an upper bound for E q,0 [ b ] as a funcion of b, o obain he following Corollary. Corollary. Le { ˆq } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Assume ha E q,0 [R a n] apple C log n for any posiive ineger n and some consan C > 0, we have E q,0 [(b)] apple b I(q, q 0 ) + C log b ( + o()). (5) I(q, q 0 ) Here o() is a vanishing erm as b!. Corollary shows ha oher han he well known firs-order approximaion b/i(q, q 0 ) [8,8], he expeced deecion delay E q,0 [(b)] is bounded by an addiional erm ha is on he order of log(b) if he esimaion algorihm has a logarihmic regre. This log b erm plays an imporan role in esablishing he opimaliy properies laer. To show he opimaliy properies for he deecion procedures, we firs selec a se of deecion procedures wih false-alarm raes lower han a prescribed value, and hen prove ha among all he procedures in he se, he expeced deecion delays of our proposed procedures are he smalles. Thus, we can choose a hreshold b o uniformly conrol he false-alarm rae of (b). Lemma (false-alarm rae of (b)). Le { ˆq } b > 0, P ((b) < ) apple exp( b). be any sequence of non-anicipaing esimaors. For any Lemma shows ha as b increases he false-alarm rae of (b) decays exponenially fas. We can se b = log(/a) o make he false-alarm rae of (b) less han some a > 0. Nex, leveraging an exising lower bound for general SPRT presened in Secion in [4], we esablish he nearly second-order asympoic opimaliy of OMD based SPRT as follows: Corollary 2 (Nearly second-order opimaliy of OCO based SPRT). Le { ˆq } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Assume ha E q,0 [R a n] apple C log n for any posiive ineger n and some consan C > 0. Define a se C(a) ={T : P (T < ) apple a}. For b = log(/a), due o Lemma, (b) 2 C(a). For such a choice, (b) is nearly second-order asympoic opimal in he sense ha for any q 2 Q s {q 0 }, as a! 0, E q,0 [(b)] inf E q,0[t] =O(log(log(/a))). (6) T2C(a) The resul means ha, compared wih any procedure (including he opimal procedure) calibraed o have a false-alarm rae less han a, our procedure incurs an a mos log(log(/a)) increase in he expeced deecion delay, which is usually a small number. For insance, even for a conservaive case when we se a = 0 5 o conrol he false-alarm rae, he number is log(log(/a)) = 2.44.

11 Enropy 208, 20, 08 of Sequenial Change-Poin Deecion Now we proceed he proof by leveraging he close connecion [8] beween he sequenial change-poin deecion and he one-sided hypohesis es. For sequenial change-poin deecion, he wo commonly used performance merics [4]areheaveragerunlengh(ARL),denoedbyE [T]; and he maximal condiional average delay o deecion (CADD), denoed by sup n 0 E q,n [T n T > n]. ARL is he expeced number of samples beween wo successive false alarms, and CADD is he expeced number of samples needed o deec he change afer i occurs. A good procedure should have a large ARL and a small CADD. Similar o he one-sided hypohesis es, one usually choose he hreshold large enough so ha ARL is larger han a pre-specified level. Similar o Theorem, we provide an upper bound for he CADD of our ASR and ACM procedures. Theorem 2. Consider he change-poin deecion procedure T ACM (b ) in (7) and T ASR (b 2 ) in (8). For any fixed k, le { ˆq k, } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Le b = b 2 = b, as b! we have ha sup E q,n [T ASR (b) n 0 i apple (I(q, q 0 )) b + E q,0 hr a (b) + O(). n T ASR (b) > n] apple sup E q,n [T ACM (b) n T ACM (b) > n] n 0 (7) To prove Theorem 2, we relae he ASR and ACM procedures o he one-sided hypohesis es and use he fac ha when he measure P is known, sup n 0 E q,n [T n T > n] is aained a n = 0 for boh he ASR and he ACM procedures. Above, we may apply a similar argumen as in Corollary o remove he dependence on (b) on he righ-hand-side of he inequaliy. We esablish he following lower bound for he ARL of he deecion procedures, which is needed for proving Corollary 3: Lemma 2 (ARL). Consider he change-poin deecion procedure T ACM (b ) in (7) and T ASR (b 2 ) in (8). For any fixed k, le { ˆq k, } be any sequence of non-anicipaing esimaors. Le b = b 2 = b, given a prescribed lower bound g > 0 for he ARL, we have provided ha b log g. E [T ACM (b)] E [T ASR (b)] g, Lemma 2 shows ha given a required lower bound g for ARL, we can choose b = log g o make he ARL be greaer han g. This is consisen wih earlier works [8,25] which show ha he smalles hreshold b such ha E [T ACM (b)] g is approximae log g. However, he bound in Lamma 2 is no igh, since in pracice we can se b = r log g for some r 2 (0, ) o ensure ha ARL is greaer han g. Combing he upper bound in Theorem 2 wih an exising lower bound for he CADD of SRRS procedure in [5], we obain he following opimaliy properies. Corollary 3 (Nearly second-order asympoic opimaliy of ACM and ASR). Consider he change-poin deecion procedure T ACM (b ) in (7) and T ASR (b 2 ) in (8). For any fixed k, le { ˆq k, } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Assume ha E q,0 [R a n] apple C log n for any posiive ineger n and some consan C > 0. Le b = b 2 = b. Define S(g) ={T : E [T] g}. For b = log g, due o Lemma 2, boh T ASR (b) and T ACM (b) belong o S(g). For such b, boh T ASR (b) and T ACM (b) are nearly second-order asympoic opimal in he sense ha for any q 2 Q {q 0 } sup E q,n [T ASR (b) n + T ASR (b) n] n inf sup E q,n [T(b) n + T(b) n] =O(log log g). T(b)2S(g) n (8)

12 Enropy 208, 20, 08 2 of 25 A similar expression holds for T ACM (b). The resul means ha, compared wih any procedure (including he opimal procedure) calibraed o have a fixed ARL larger han g, our procedure incurs an a mos log(log g) increase in he CADD. Comparing (8) wih (6), we noe ha he ARL g plays he same role as /a because /g is roughly he false-alarm rae for sequenial change-poin deecion [8] Example: Regre Bound for Specific Cases In his subsecion, we show ha he regre bound R can be expressed as a weighed sum of Bregman divergences beween wo consecuive esimaors. This form of R is useful o show he logarihmic regre for OMD. The following resul comes as a modificaion of [6]. Theorem 3. Assume ha X, X 2,...are i.i.d. random variables wih densiy funcion f q (x). Le h i = /i in Algorihm. Assume ha { ˆq i } i, { ˆµ i } i are obained using Algorihm and ˆq i = q i (defined in sep 7 and 8 of Algorihm ) for any i. Then for any q 0 2 Q and, R = i B F ( ˆµ i, ˆµ i )= 2 where µ i = l ˆµ i +( l) ˆµ i, for some l 2 (0, ). i ( ˆµ i ˆµ i ) [r 2 F ( µ i )]( ˆµ i ˆµ i ), Nex, we use Theorem 3 on a concree example. The mulivariae normal disribuion, denoed by N (q, I d ), is paramerized by an unknown mean parameer q and a known covariance marix I d (I d is a d d ideniy marix). Following he noaions in Secion 2.3, we know ha f(x) =x, dh(x) =(/ p 2pI d ) exp ( x x/2), Q = Q s = R d for any s < 2, F(q) =(/2)q q, µ = q and F (µ) =(/2)µ µ, where denoes he deerminan of a marix, and H is a probabiliy measure under which he sample follows N (0, I d )). When he covariance marix is known o be some S 6= I d, one can whien he vecors by muliplying S /2 o obain he siuaion here. Corollary 4 (Upper bound for he expeced regre, Gaussian). Assume X, X 2,... are i.i.d. following N (q, I d ) wih some q 2 R d. Assume ha { ˆq i } i, { ˆµ i } i are obained using Algorihm wih h i = /i and G = R d. For any > 0, we have ha for some consan C > 0 ha depends on q, E q,0 [R ] apple C d log /2. The following calculaions jusify Corollary 4, which also serve as an example of how o use regre bound. Firs, he assumpion ˆq = q in Theorem 3 is saisfied for he following reasons. Consider G = R d is he full space. According o Algorihm, using he non-negaiviy of he Bregman divergence, we have ˆq = arg min u2g B F (u, q )= q. Then he regre bound can be wrien as R = 2 ( ˆµ ˆµ 0 ) ( ˆµ ˆµ 0 )+ 2 = 2 (X q 0 ) (X q 0 )+ 2 i=2 i=2 [i ( ˆµ i ˆµ i ) ( ˆµ i ˆµ i )] ( ˆµ i ˆµ i ) (f(x i ) ˆµ i ).

13 Enropy 208, 20, 08 3 of 25 Since he sep-size h i = /i, he second erm in he above equaion can be wrien as: 2 = 2 = = i=2 i=2 i=2 i=2 Combining above, we have ( ˆµ i ˆµ i ) (f(x i ) ˆµ i ) ( ˆµ i ˆµ i ) (f(x i )+ ˆµ i ) 2(i ) (f(x i) ˆµ i ) (f(x i )+ ˆµ i )+ 2(i ) kx ik 2 i=2 E q,0 [R ] apple 2 E q,0[(x q 0 ) (X q 0 )] ( ˆµ i ˆµ i ) ( ˆµ i + ˆµ i ) i=2 i=2 2 (k ˆµ i k 2 k ˆµ i k 2 ) 2(i ) k ˆµ ik k ˆµ k 2 2 k ˆµ k 2. i=2 i E q,0[kx i k 2 ]+ 2 E q,0[kx k 2 ]. Finally, since E q,0 [kx i k 2 ]=d( + q 2 ) for any i, we obain he desired resul. Thus, wih i.i.d. mulivariae normal samples, he expeced regre grows logarihmically wih he number of samples. Using he similar calculaions, we can also bound he expeced regre in he general case. As shown in he proof above for Corollary 4, he dominaing erm for R can be rewrien as i=2 2(i ) (f(x i) ˆµ i ) [r 2 F ( µ i )](f(x i )+ ˆµ i ), where µ i is a convex combinaion of ˆµ i and ˆµ i. For an arbirary disribuion, he erm (f(x i ) ˆµ i ) [r 2 F ( µ i )](f(x i )+ ˆµ i ) can be viewed as a local normal disribuion wih he changing curvaure r 2 F ( µ i ). Thus, i is possible o prove case-by-case he O(log )-syle bounds by making more assumpions abou he disribuions. Recall he noaion Q s in Secion 2.3 such ha log f q (x) is s-srongly convex over Q s. Le k k 2 denoe he `2 norm. Moreover, we assume ha he rue parameer belongs o a se G ha is a closed and convex subse of Q s such ha sup q2g krf(q)k 2 apple M for some consan M. Thus, one can show ha log f q (x) is no only s-srongly convex bu also M-srongly smooh over G. Theorem 3 in [0] shows ha for all q 2 G and n, consider ha { ˆq i } i is obained by OMD, hen apple 2 E 2 q,0 max appleiapplen kx i k M E q,0 [R n ] apple s (log n + ). Therefore, for any bounded disribuions wihin he exponenial family, we achieve a logarihmic regre. This logarihmic regre is valid for Bernoulli disribuion, Bea disribuion and some runcaed versions of classic disribuions (e.g., runcaed Gaussian disribuion, runcaed Gamma disribuion and runcaed Geomeric disribuion analyzed in [44]). 4. Numerical Examples In his secion, we presen some synheic examples o demonsrae he good performance of our mehods. We will focus on ACM and ASR for sequenial change-poin deecion. In he following, we consider he window-limied versions (see Remark ) of ACM and ASR wih window size w = 00. Recall ha when he measure P is known, sup n 0 E q,n [T n T > n] is aained a n = 0 for boh ASR and ACM procedures (a proof can be found in he proof of Theorem 2). Therefore, in he following experimens we define he expeced deecion delay (EDD) as E q,0 [T] for a sopping ime T. To compare he performance beween differen deecion procedures, we deermine he hreshold for each deecion procedure by Mone-Carlo simulaions such ha he ARL for each procedure is abou 0, 000. Below,

14 Enropy 208, 20, 08 4 of 25 we denoe k k 2, k k and k k 0 as he `2 norm, ` norm and `0 norm defined as he number of non-zero enries, respecively. The following experimens are all run on he same Macbook Air wih an Inel i7 Core CPU. 4.. Deecing Sparse Mean-Shif of Mulivariae Normal Disribuion We consider deec he sparse mean shif for mulivariae normal disribuion. Specifically, we assume ha he pre-change disribuion is N (0, I d ) and he pos-change disribuion is N (q, I d ) for some unknown q 2{q 2 R d : kqk 0 apple s}, where s is called he sparsiy of he mean shif. Sparse mean shif deecion is of paricular ineres in sensor neworks [45,46]. For his Gaussian case, he Bregman divergence is given by B F (q, q 2 )=I(q 2, q )=kq q 2 k 2 2 /2. Therefore, he projecion ono G in Algorihm is a Euclidean projecion ono a convex se, which in many cases can be implemened efficienly. As a frequenly used convex relaxaion of he `0-norm ball, we se G = {q : kqk apple s} (i is known ha imposing an ` consrain leads o sparse soluion; see, e.g., [47]). Then, he projecion ono ` ball can be compued very efficienly via a simple sof-hresholding echnique [48]. Two benchmark procedures are he CUSUM and he GLR. For he CUSUM procedure, we specify a nominal pos-change mean, which is an all-one vecor. If knowing he pos-change mean is sparse, we can also use he shrinkage esimaor presened in [49], which performs hard or sof hresholding of he esimaed pos-change mean parameer. Our procedures are T ASR (b) and T ACM (b) wih G = R d and G = {q : kqk apple 5}. In he following experimens, we run 0, 000 Mone Carlo rials o obain each simulaed EDD. In he experimens, we se d = 20. The pos-change disribuions are N (q, I d ), where 00p% enry of q is and ohers are 0, and he locaion of nonzero enries are random. Table shows he EDDs versus he proporion p. Noe ha our procedures incur lile performance loss compared wih he GLR procedure and he CUSUM procedure. Noably, T ACM (b) wih G = {q : kqk apple 5} performs almos he same as he GLR procedure and much beer han he CUSUM procedure when p is small. This shows he advanage of projecion when he rue parameer is sparse. Table. Comparison of he EDDs in deecing he sparse mean shif of mulivariae Gaussian disribuion. Below, CUSUM : CUSUM procedure wih pre-specified all-one vecor as pos-change parameer; Shrinkage : componen-wise shrinkage esimaor in [49]; GLR : GLR procedure; ASR : T ASR (b) wih G = R d ; ACM : T ACM (b) wih G = R d ; ASR-L : T ASR (b) wih G = {q : kqk apple 5}; ACM-L : T ACM (b) wih G = {q : kqk apple 5}. p is he proporion of non-zero enries in q. We run 0, 000 Mone Carlo rials o obain each value. For each value, he sandard deviaion is less han one half of he value. p = 0. p = 0.2 p = 0.3 p = 0.4 p = 0.5 p = 0.6 CUSUM Shrinkage GLR ASR ACM ASR-` ACM-` Deecing he Scale Change in Gamma Disribuion We consider an example ha deecs he scale change in Gamma disribuions. Assume ha we observe a sequence X, X 2...of samples drawn from Gamma(a, b) for some a, b > 0, wih he probabiliy densiy funcion given by f a,b (x) =exp( xb)x a b a / G(a) (o avoid confusion wih he G parameer in Algorihm we use G( ) o denoe he Gamma funcion). The parameer a is called he dispersion parameer ha scales he loss and he divergences. For simpliciy, we fix a = jus like we fix he variance in he Gaussian case. The specificaions in he Algorhm are as follows: q = b,

15 Enropy 208, 20, 08 5 of 25 Q =(,0), f(x) =x, dh(x) =, F(q) = log( q), µ = /q and F (µ) = log µ. Assume ha he pre-change disribuion is Gamma(, ) and he pos-change disribuion is Gamma(, b) for some unknown b > 0. We compare our algorihms wih CUSUM, GLR and non-anciipaing esimaor based on he mehod of momen (MOM) esimaor in [8]. For he CUSUM procedure, we specify he pos-change b o be 2. The resuls are shown in Table 2. CUSUM fails o deec he change when b = 0., which is far away from he pre-specified pos-change parameer b = 2. We can see ha performance loss of he proposed ACM mehod compared wih GLR and MOM is very small. Table 2. Comparison of he EDDs in deecing he scale change in Gamma disribuion. Below, CUSUM : CUSUM procedure wih pre-specified pos-change parameer b = 2; MOM : Mehod of Momens esimaor mehod; GLR : GLR procedure; ASR : T ASR (b) wih G =(,0); ACM : T ACM (b) wih G =(,0). We run 0, 000 Mone Carlo rials o obain each value. For each value, he sandard deviaion is less han one half of he value. b = 0. b = 0.5 b = 2 b = 5 b = 0 CUSUM NaN MOM GLR ASR ACM Communicaion-Rae Change Deecion wih Erdős-Rényi Model Nex, we consider a problem o deec he communicaion-rae change in a nework, which is a model for social nework daa. Suppose we observe communicaion beween nodes in a nework over ime, represened as a sequence of (symmeric) adjacency marices of he nework. A ime, if node i and node j communicaes, hen he adjacency marix has on he ijh and jih enries (hus i forms an undireced graph). The nodes ha do no communicae have 0 on he corresponding enries. We model such communicaion paerns using he Erdos-Renyi random graph model. Each edge has a fixed probabiliy of being presen or absen, independenly of he oher edges. Under he null hypohesis, each edge is a Bernoulli random variable ha akes values wih known probabiliy p and value 0 wih probabiliy p. Under he alernaive hypohesis, here exiss an unknown ime k, afer which a small subse of edges occur wih an unknown and differen probabiliy p 0 6= p. In he experimens, we se N = 20 and d = 90. For he pre-change parameers, we se p i = 0.2 for all i =,..., d. For he pos-change parameers, we randomly selec n ou of he 90 edges, denoed by E, and se p i = 0.8 for i 2Eand p i = 0.2 for i /2 E. As said before, le he change happen a ime n = 0 (since he upper bound for EDD is achieved a n = 0 as argued in he proof of Theorem 2). To implemen CUSUM, we specify he pos-change parameers p i = 0.8 for all i =,..., d. The resuls are shown in Table 3. Our procedures are beer han CUSUM procedure when n is small since he pos-change parameers used in CUSUM procedure is far from he rue parameer. Compared wih he GLR procedure, our mehods have a small performance loss, and he loss is almos negligible as n approaches o d = 90. Table 3. Comparison of he EDDs in deecing he changes of he communicaion-raes in a nework. Below, CUSUM : CUSUM procedure wih pre-specified pos-change parameers p = 0.8; GLR : GLR procedure; ASR : T ASR (b) wih G = R; ACM : T ACM (b) wih G = R. We run 0, 000 Mone Carlo rials o obain each value. For each value, he sandard deviaion is less han one half of he value. n = 78 n = 00 n = 20 n = 50 n = 70 n = 90 CUSUM GLR ASR ACM

16 Enropy 208, 20, 08 6 of 25 Below are he specificaions of Algorihm in his case. For Bernoulli disribuion wih unknown parameer p, he naural parameer q is equal o log(p/( p)). Thus, we have Q = R, f(x) =x, dh(x) =, F(q) =log( + exp(q)), µ = exp(q)/( + exp(q)) and F (µ) =µ log µ +( µ)log( µ) Poin Process Change-Poin Deecion: Poisson o Hawkes Processes In his example, o illusrae he siuaion in Secion.2, we consider a case where a homogeneous Poisson process swiches o a Hawkes process (see, e.g., [2]); his can be viewed as a simples case in Secion.2 wih one node. We consruc ACM and ASR procedures. In his case, he MLE for he unknown pos-change parameer canno be found in close-form, ye ACM and ASR can be easily consruced and give reasonably good performance, alhough our heory no longer holds in his case due o he lack of i.i.d. samples. The Hawkes process can be viewed as a non-homogeneous Poisson process where he inensiy is influenced by hisorical evens. The daa consis of a sequence of evens occurring a imes {, 2,..., n } before a ime horizon T: i apple T. Assume he inensiy of he Poisson process is l s, s 2 (0, T) and here may exiss a change-poin k 2 (0, T) such ha he process changes. The null and alernaive hypohesis ess are 8 >< >: H 0 : l s = µ, 0 < s < T; H : l s = µ, 0 < s < k, l s = µ + q k<j <s j(s j ), k < s < T, where µ is a known baseline inensiy, q > 0 is unknown magniude of he change, j(s) =be bs is he normalized kernel funcion wih pre-specified parameer b > 0, which capures he influence from he pas evens. We rea he pos-change influence parameer q as unknown since i represens an anomaly. We firs use a sliding window o conver he even imes ino a sequence of vecors wih overlapping evens. Assume of size of he sliding window is L. For a given scanning ime T i apple T, we map all he evens in [T i L, T i ] o a vecor X i =[ (),..., (mi )], (i) 2 [T i L, T i ], where m i is he number of evens falling ino he window. Noe ha X i can have differen lengh for differen i. Consider a se of scanning imes T, T 2,..., T. This maps he even imes ino a sequence of vecors X, X 2,..., X of lenghes m, m 2,..., m. These scanning imes can be arbirary; here we se hem o be even imes so ha here are a leas one sample per sliding window. For a hypoheical change-poin locaion k, i can be shown ha he log-likelihood raio (beween he Hawkes process and he Poisson process) as a funcion of q, is given by `(q X i )= q 2(T i L,T i ) log 2 4µ + q j 2(T i L, q ) be b( q j ) 3 5 µl q h e b(t i i q ). (9) q 2(T i L,T i ) Now based on his sliding window approach, we can approximae he original change-poin deecion problem as he following. Wihou change, X,..., X are sampled from a Poisson process. Under he alernaive, he change occurs a some ime such ha X,..., X k are sampled from a Poisson process, and X k+,..., X are sampled from a Hawkes process wih parameer q, raher han a Poisson process. We define he esimaor of q, for assumed change-poin locaion k = k as follows ˆq k,i, ˆq k,i (X k,...,x i )= ˆq k,i (` 2 [T k, T i ]) (20) Now, consider k 2 [i w, i ], and keep w esimaors: ˆq i w,i,..., ˆq i,i. The updae for each esimaor is based on sochasic gradien descen. By aking derivaive wih respec o q, we have

17 Enropy 208, 20, 08 7 of 25 `(q X i ) a = q 2(T i L,T i ) j 2(T i L, q ) be b( q j ) µ + q j 2(T i L, q ) be b( q j ) q 2(T i L,T i ) h e b(t i q ) i, Noe ha here is no close form expression for he MLE, which he soluion o he above equaion. We perform sochasic gradien descen insead ˆq k,i+ = ˆq k,i g `(q X i+) q q= ˆq k,i, k = i w +, i w,...,i, where g > 0 is he sep-size. Now we can apply he ACM and ASR procedures, by using he fac ha f ˆq k, (X + )/ f q0 (X + )=`( ˆq k, X + ) and calculaing using (9). Table 4 shows he EDD for differen a. Here we choose he hreshold such ha ARL is We see ha he scheme has a reasonably good performance, he deecion delay decreases as he rue signal srengh q increases. Table 4. Poin process change-poin deecion: EDD of ACM and ASR procedures for various values of rue q; ARL of he procedure is conrolled o be 5000 by selecing hreshold via Mone Carlo simulaion. q = 0.4 q = 0.5 q = 0.5 q = 0.7 ACM ASR Conclusions In his paper, we consider sequenial hypohesis esing and change-poin deecion wih compuaionally efficien one-sample updae schemes obained from online mirror descen. We show ha he loss of he saisical efficiency caused by he online mirror descen esimaor (replacing he exac maximum likelihood esimaor using he complee hisorical daa) is relaed o he regre incurred by he online convex opimizaion procedure. The resul can be generalized o any esimaion mehod wih logarihmic regre bound. This resul sheds lighs on he relaionship beween he saisical deecion procedures and he online convex opimizaion. Acknowledgmens: This research was suppored in par by Naional Science Foundaion (NSF) NSF CCF , CMMI , NSF CAREER CCF o Yao Xie. We would like o hank he anonymous reviewers o provide insighful commens. Auhor Conribuions: Yang Cao, Yao Xie, and Huan Xu conceived he idea and performed he heoreical par of he paper; Liyan Xie helped wih numerical examples of he manuscrip. Conflics of Ineres: The auhors declare no conflic of ineres. Appendix A. Proofs Proof of Theorem. In he proof, for he simpliciy of noaion we use N o denoe (b). Recall q is he rue parameer. Define ha S q = log f q(x i ) f q0 (X i ). Then under he measure P q,0, S is a random walk wih i.i.d. incremen. Then, by Wald s ideniy (e.g., []) we have ha E q,0 [S q N ]=E q,0[n] I(q, q 0 ). (A) On he oher hand, le qn denoe he MLE based on (X,..., X N ). The key o he proof is o decompose he sopped process S q N as a summaion of hree erms as follows:

18 Enropy 208, 20, 08 8 of 25 S q N N = log f N q(x i ) f q N (X i ) + log f q (X N N i) f ˆq (X i i ) + log f ˆq (X i i ) f q0 (X i ), (A2) Noe ha he firs erm of he decomposiion on he righ-hand side of (A2) is always non-posiive since N log f N q(x i ) f q N (X i ) = log f q (X i ) sup N q2q log f q (X i) apple 0. Therefore we have E q,0 [S q N ] apple E q,0 " N log f q (X # " N i) N + E f ˆq (X i i ) q,0 log f # ˆq (X i i ). f q0 (X i ) Now consider he hird erm in he decomposiion (A2). Similar o he proof of equaion (5.09) in [4], we claim ha is expecaion under measure P q,0 is upper bounded by b/i(q, q 0 )+O() as b!. Nex, we prove he claim. For any posiive ineger n, we furher decompose he hird erm in (A2) as n where and log f ˆq i (X i ) f q0 (X i ) = M n(q) G n (q)+m n (q, q 0 )+ni(q, q 0 ), (A3) M n (q) = m n (q, q 0 )= n G n (q) = n log f ˆq i (X i ) f q (X i ) n I(q, ˆq i ), log f q(x i ) f q0 (X i ) + G n (q), ni(q, q 0 ). The decomposiion of (A3) consiss of sochasic processes {M n (q)} and {m n (q, q 0 )}, which are boh P q,0 -maringales wih zero expecaion, i.e., E q,0 [M n (q)] = E q,0 [m n (q, q 0 )] = 0 for any posiive ineger n. Since for exponenial family, he log-pariion funcion F(q) is bounded, by he inequaliies for maringales [47] we have ha E q,0 M n (q) applec p n, Eq,0 m n (q, q 0 ) applec 2 p n, (A4) where C and C 2 are wo absolue consans ha do no depend on n. Moreover, we observe ha for all q 2 Q, E q,0 [G n (q)] apple E q,0 applemax G n ( q) = E q,0 [R n (q)] apple C log n. q2q Therefore, applying (A4), we have ha n G n (q), n M n (q) and n m n (q, q 0 ) converge o 0 almos surely. Moreover, he convergence is P q,0 -r-quickly for r =. We say ha n A n converges P q,0 -r-quickly o a consan I if E q,0 [G(e)] r < for all e > 0, where G(e) = sup{n : n A n I > e} is he las ime when n A n leaves he inerval [I e, I + e] (for more deails, we refer he readers o Secion of [4]). Therefore, dividing boh sides of (A3) by n, we obain n n log( f ˆq i (X i )/ f q0 (X i )) converges P q,0 --quickly o I(q, q 0 ). For e > 0, we now define he las enry ime L(e) =sup ( n : I(q, q 0 ) n log f ˆq i (X i ) f q0 (X i ) n > en ).

19 Enropy 208, 20, 08 9 of 25 By he definiion of P q,0 --quickly convergence and he finieness of I(q, q 0 ), we have ha E q,0 [L(e)] < + for all e > 0. In he following, define a scaled hreshold b = b/i(q, q 0 ). Observe ha condiioning on he even {L(e)+ < N < + }, we have ha ( e)(n )I(q, q 0 ) < N log f ˆq (X i i ) f q0 (X i ) < b. Therefore, condiioning on he even {L(e)+ < N < + }, we have ha N < + b/( for any 0 < e <, we have e). Hence, N apple + I({N > L(e)+}) b b + I({N apple L(e)+}) L(e) apple + + L(e). (A5) e e Since E q,0 [L(e)] < for any e > 0, from (A5) above, we have ha he hird erm in (A2) is upper bounded by b + O(). Finally, he second erm in (A2) can be wrien as N log f q N (X i) f ˆq i (X i ) = N log f ˆq i (X i ) inf N q2q log f q (X i), which is jus he regre defined in (2) for he online esimaors: R, when he loss funcion is defined o be he negaive likelihood funcion. Then, he heorem is proven by combining he above analysis for he hree erms in (A2) and (A). Proof of Corollary. Firs, we can relae he expeced regre a he sopping ime o he expeced sopping ime, using he following chain of equaliies and inequaliies E q,0 [R (b) ]=E q,0 [E q,0 [R n (b) =n]] apple E q,0 [C log (b)] apple C log E q,0 [(b)], (A6) where he firs equaliy uses ieraive expecaion, he firs inequaliy uses he assumpion of he logarihmic regre in he saemen of Corollary, and he second inequaliy is due o Jensen s inequaliy. Le a =(b + O())/I(q, q 0 ), b = C/I(q, q 0 ) and x = E q,0 [(b)]. Applying (A6), he upper bound in Equaion (4) becomes x apple a + b log(x). From his, we have x apple O(a). Taking logarihm on boh sides and using he fac ha max{a + a 2 }applea + a 2 apple 2 max{a, a 2 } for a, a 2 0, log(x) apple max{log(2a), log(2b log x)} applelog(a)+o(log b). Therefore, we have ha x apple a + b(log(a)+o(log b)). Using his argumen, we obain E q,0 [(b)] apple b I(q, q 0 ) + C log b ( + o()). I(q, q 0 ) (A7) Noe ha a similar argumen can be found in [49]. Nex we will esablish a few Lemmas useful for proving Theorem 2 for sequenial deecion procedures. Define a measure Q on (X, B ) under which he probabiliy densiy of X i condiional on F i is f ˆq i. Then for any even A 2F i, we have ha Q(A) = R A L idp. The following lemma shows ha he resricion of Q o F i is well defined. Lemma A. Le Q i be he resricion of Q o F i. Then for any A 2F k and any i k, Q i (A) =Q k (A). Proof of Lemma. To bound he erm P ((b) < ), we need ake advanage of he maringale propery of L in (2). The major echnique is he combinaion of change of measure and Wald s likelihood raio ideniy []. The proofs are a combinaion of he resuls in [23] and [8] and he reader can find a complee proof in [23]. For purpose of compleeness we copy hose proofs here.

20 Enropy 208, 20, of 25 Define he L i = dp i /dq i as he Radon-Nikodym derivaive, where P i and Q i are he resricion of P and Q o F i, respecively. Then we have ha L i =(L i ) for any i (noe ha L i is defined in (2)). Combining he Lemma A and he Wald s likelihood raio ideniy, we have ha P (A \{(b) < }) =E Q h I({(b) < }) L (b) i, 8A 2F (b), (A8) where I(E) is an indicaor funcion ha is equal o for any w 2 E and is equal o 0 oherwise. By he definiion of (b) we have ha L (b) apple exp( b). Taking A = X in (A8) we prove ha P ((b) < ) apple exp( b). Proof of Corollary 2. Using (5.80) and (5.88) in [4], which are abou asympoic performance of open-ended ess. Since our problem is a special case of he problem in [4], we can obain inf E q,0[t] = T2C(a) log a I(q, q 0 ) + log(log(/a)) ( + o()). 2I(q, q 0 ) Combing he above resul and he righ-hand side of (5), we prove he corollary. Proof of Theorem 2. From (A0), we have ha for any n, E q,n [T ASR (b) n T ASR (b) > n] apple E q,n [T ACM (b) n T ACM (b) > n]. Therefore, o prove he heorem using Theorem, i suffices o show ha sup E q,n [T ACM (b) n 0 n T ACM (b) > n] apple E q,0 [(b)]. Using an argumen similar o he remarks in [8], we have ha he supreme of deecion delay over all change locaions is achieved by he case when change occurs a he firs insance, sup E q,n [T ACM (b) n T ACM (b) > n] =E q,0 [T ACM (b)]. (A9) n 0 This is a sligh modificaion (a small change on he subscrips) of he remarks in [8] bu for he purpose of compleeness and clearness we wrie he deails in he following. Noice ha since q 0 is known, for any j, he disribuion of {max j+applekapple L k, } =j+ under P q,j condiional on F j is he same as he disribuion of {max applekapple L k, } = under P q,0. Below, we use a renewal propery of he ACM procedure. Define T (j) ACM (b) =inf{ > j : max log L k, > b}. j+applekapple Then we have ha E q,0 [T ACM (b)] = E q,j [T (j) ACM (b) max j+applekapple L k, for any > j. Therefore, T (j) ha for all j, ACM (b) j T(j) ACM (b) > j]. However, max applekapple log L k, T ACM(b) condiioning on {T ACM (b) > j}. So E q,0 [T ACM (b)] = E q,j [T (j) ACM (b) j T ACM(b) > j] E q,j [T ACM (b) j T ACM (b) > j]. Thus, o prove (A9), i suffices o show ha E q,0 [T ACM (b)] apple E q,0 [(b)]. To show his, define (b) () as he new sopping ime ha applies he one-sided sequenial hypohesis esing procedure (b) o daa {X i } i=. Then we have ha in fac T ACM(b) =min {(b) () + }, his relaionship was developed in [8]. Thus, T ACM (b) apple (b) () + = (b), and E q,0 [T ACM (b)] apple E q,0 [(b)].

21 Enropy 208, 20, 08 2 of 25 Proof of Lemma 2. This is a classic resul proved by using he maringale propery and he proof rouine can be found in many exbooks such as [4]. Firs, rewrie T ASR (b) as Nex, since log T ASR (b) =inf ( : log! ) L k, > b. k=! L k, > log max L k, = max log L k,, k= applekapple applekapple (A0) we have E [T ACM (b)] E [T ASR (b)]. So i suffices o show ha E [T ASR (b)] g, if b log g. Define R = k= L k,. Direc compuaion shows ha E [R F ]=E "L, + =E " + = + k= = + R. k= k= L k, F # L k, log f ˆq (X ) f q0 (X ) F L k, E " log f ˆq (X ) f q0 (X ) F Therefore, {R } is a (P, F )-maringale wih zero mean. Suppose ha E [T ASR (b)] < (oherwise he saemen of proposiion is rivial), hen we have ha = P (T ASR (b) ) <. (A) Equaion (A) leads o he fac ha P (T ASR (b)) = o( ) and he fac ha 0 apple R apple exp(b) condiioning on he even {T ASR (b) > }, we have ha Z lim inf R dp apple lim inf (exp(b)+)p (T ASR (b) ) =0.! {T ASR (b)>}! Therefore, we can apply he opional sopping heorem for maringales, o obain ha E [R TASR (b)] = E [T ASR (b)]. By he definiion of T ASR (b), R TASR (b) > exp(b) we have ha E [T ASR (b)] > exp(b). Therefore, if b log g, we have ha E [T ACM (b)] E [T ASR (b)] g. Proof of Corollary 3. Our Theorem and he remarks in [5] show ha he minimum wors-case deecion delay, given a fixed ARL level g, is given by # # inf sup T(b)2S(g) n E q,n [T(b) n + T(b) n] = log g I(q, q 0 ) d log log g + ( + o()). 2I(q, q 0 ) (A2) I can be shown ha he infimum is aained by choosing T(b) as a weighed Shiryayev-Robers deecion procedure, wih a careful choice of he weigh over he parameer space Q. Combing (A2) wih he righ-hand side of (5), we prove he corollary. The following derivaion borrows ideas from [6]. Firs, we derive concise forms of he wo erms in he definiion of R in (2).

22 Enropy 208, 20, of 25 Lemma A2. Assume ha X, X 2,... are i.i.d. random variables wih densiy funcion f q (x), and assume decreasing sep-size h i = /i in Algorihm. Given { ˆq i } i, { ˆµ i } i generaed by Algorihm. If ˆq i = q i for any i, hen for any null disribuion parameer q 0 2 Q and, { log f ˆq i (X i )} = ib F ( ˆµ i, ˆµ i ) F ( ˆµ ). (A3) Moreover, for any, where ˆµ =(/) f(x i). inf q2q { log f q (X i)} = F ( ˆµ), (A4) By subracing he expressions in (A3) and (A4), we obain he following resul which shows ha he regre can be represened by a weighed sum of he Bregman divergences beween wo consecuive esimaors. Proof of Lemma A2. By he definiion of he Legendre-Fenchel dual funcion we have ha F (µ) = q µ F(q) for any q 2 Q. By his definiion, and choosing h i = /i, we have ha for any i log f ˆq (X i i )=F( ˆq i ) ˆq i f(x i)= ˆq i ( ˆµ f(x i )) F ( ˆµ i )= ˆq h i ( ˆµ i ˆµ i ) F ( ˆµ i ) i = (F ( ˆµ h i ) F ( ˆµ i )) ˆq i ( ˆµ i ˆµ i ) F ( ˆµ i h i )+ F ( ˆµ i h i ) i = h i B F ( ˆµ i, ˆµ i )+ h i F ( ˆµ i ) h i F ( ˆµ i ), (A5) where we use he updae rule in Line 6 of Algorihm and he assumpion ˆq i = q i o have he hird equaion. We define /h 0 = 0 in he las equaion. Now summing he erms in (A5), where he second erm form a elescopic series, over i from o, we have ha { log f ˆq i (X i )} = = h i B F ( ˆµ i, ˆµ i )+ h 0 F ( ˆµ 0 ) h i B F ( ˆµ i, ˆµ i ) F ( ˆµ ). h F ( ˆµ ) Moreover, from he definiion we have ha Taking he firs derivaive of { he saionary poin, given by { log f q (X i )} = [F(q) q f(x i )]. log f q(x i )} wih respec o q and seing i o 0, we find ˆµ, ˆµ = rf(q) = f(x i ). Similarly, using he expression of he dual funcion, and plugging ˆµ back ino he equaion, we have ha inf q2q { log f q (X i)} = q ˆµ F ( ˆµ) q f(x i )= F ( ˆµ).

23 Enropy 208, 20, of 25 Proof of Theorem 3. By choosing he sep-size h i = /i for any i ˆq i = q i for any i, we have by inducion ha in Algorihm, and assuming ˆµ = f(x i )= ˆµ. Subracing (A3) by (A4), we obain R = = = = = 2 { log f ˆq i (X i )} inf q2q { log f q (X i)} ib F ( ˆµ i, ˆµ i ) F ( ˆµ )+F ( ˆµ) ib F ( ˆµ i, ˆµ i ) The final equaliy is obained by Taylor expansion. References i[f ( ˆµ i ) F ( ˆµ i ) hrf ( ˆµ i ), ˆµ i ˆµ i i] i ( ˆµ i ˆµ i ) [r 2 F ( µ i )]( ˆµ i ˆµ i ).. Siegmund, D. Sequenial Analysis: Tess and Confidence Inervals; Springer: Berlin, Germany, Chen, J.; Gupa, A.K. Parameric Saisical Change Poin Analysis; Birkhauser: Basel, Swizerland, Siegmund, D. Change-poins: From sequenial deecion o biologu and back. Seq. Anal. 203, 23, Tarakovsky, A.; Nikiforov, I.; Basseville, M. Sequenial Analysis: Hypohesis Tesing and Changepoin Deecion; CRC Press: Boca Raon, FL, USA, Granjon, P. The CuSum Algorihm A Small Review Available online: hps://hal.archives-ouveres. fr/hal (accessed on 6 February 208). 6. Basseville, M.; Nikiforov, I.V. Deecion of Abrup Changes: Theory and Applicaion; Prenice Hall Englewood Cliffs: Upper Saddle River, NJ, USA, 993; Volume Lai, T.Z. Informaion bounds and quick deecion of parameer changes in sochasic sysems. IEEE Trans. Inf. Theory 998, 44, Lorden, G.; Pollak, M. Nonanicipaing esimaion applied o sequenial analysis and changepoin deecion. Ann. Sa. 2005, 33, Raginsky, M.; Marcia, R.F.; Silva, J.; Wille, R. Sequenial probabiliy assignmen via online convex programming using exponenial families. In Proceedings of he IEEE Inernaional Symposium on Informaion Theory, Seoul, Korea, 28 June 3 July 2009; IEEE: Piscaaway, NJ, USA, 2009; pp Raginsky, M.; Wille, R.; Horn, C.; Silva, J.; Marcia, R. Sequenial anomaly deecion in he presence of noise and limied feedback. IEEE Trans. Inf. Theory 202, 58, Peel, L.; Clause, A. Deecing change poins in he large-scale srucure of evolving neworks. In Proceedings of he 29h AAAI Conference on Arificial Inelligence (AAAI), Ausin, TX, USA, January Li, S.; Xie, Y.; Farajabar, M.; Verma, A.; Song, L. Deecing weak changes in dynamic evens over neworks. IEEE Trans. Signal Inf. Process. Over New. 207, 3, Cesa-Bianchi, N.; Lugosi, G. Predicion, Learning, and Games; Cambridge Universiy Press: Cambridge, UK, Hazan, E. Inroducion o online convex opimizaion. Found. Trends Opim. 206, 2, Siegmund, D.; Yakir, B. Minimax opimaliy of he Shiryayev-Robers change-poin deecion rule. J. Sa. Plan. Inference 2008, 38, Azoury, K.; Warmuh, M. Relaive loss bounds for on-line densiy esimaion wih he exponenial family of disribuions. Mach. Learn. 200, 43,

24 Enropy 208, 20, of Page, E. Coninuous inspecion schemes. Biomerika 954, 4, Lorden, G. Procedures for reacing o a change in disribuion. Ann. Mah. Sa. 97, 42, Mousakides, G.V. Opimal sopping imes for deecing changes in disribuions. Ann. Sa. 986, 4, Shiryaev, A.N. On opimum mehods in quickes deecion problems. Theory Probab. Appl. 963, 8, Willsky, A.; Jones, H. A generalized likelihood raio approach o he deecion and esimaion of jumps in linear sysems. IEEE Trans. Auom. Conrol 976, 2, Lai, T.L. Sequenial changepoin deecion in qualiy conrol and dynamical sysems. J. R. Sa. Soc. Ser. B 995, 57, Lai, T.Z. Likelihood raio ideniies and heir applicaions o sequenial analysis. Seq. Anal. 2004, 23, Lorden, G.; Pollak, M. Sequenial change-poin deecion procedures ha are nearly opimal and compuaionally simple. Seq. Anal. 2008, 27, Pollak, M. Average run lenghs of an opimal mehod of deecing a change in disribuion. Ann. Sa. 987, 5, Mei, Y. Sequenial change-poin deecion when unknown parameer are presen in he pre-change disribuion. Ann. Sa. 2006, 34, Yilmaz, Y.; Mousakides, G.V.; Wang, X. Sequenial join deecion and esimaion. Theory Probab. Appl. 205, 59, Yılmaz, Y.; Li, S.; Wang, X. Sequenial join deecion and esimaion: Opimum ess and applicaions. IEEE Trans. Signal Process. 206, 64, Yilmaz, Y.; Guo, Z.; Wang, X. Sequenial join specrum sensing and channel esimaion for dynamic specrum access. IEEE J. Sel. Areas Commun. 204, 32, Vo, B.N.; Vo, B.T.; Pham, N.T.; Suer, D. Join deecion and esimaion of muliple objecs from image observaions. IEEE Trans. Signal Process. 200, 58, Tajer, A.; Jajamovich, G.H.; Wang, X.; Mousakides, G.V. Opimal join arge deecion and parameer esimaion by MIMO radar. IEEE J. Sel. Top. Signal Process. 200, 4, Baygun, B.; Hero, A.O. Opimal simulaneous deecion and esimaion under a false alarm consrain. IEEE Trans. Inf. Theory 995, 4, Mousakides, G.V.; Jajamovich, G.H.; Tajer, A.; Wang, X. Join deecion and esimaion: Opimum ess and applicaions. IEEE Trans. Inf. Theory 202, 58, Kolowski, W.; Grünwald, P. Maximum likelihood vs. sequenial normalized maximum likelihood in on-line densiy esimaion. In Proceedings of he Conference on Learning Theory (COLT), Budapes, Hungary, 9 July 20; pp Anava, O.; Hazan, E.; Mannor, S.; Shamir, O. Online learning for ime series predicion. In Proceedings of he Conference on Learning Theory (COLT), Princeon, NJ, USA, 2 4 June 203; pp Wald, A.; Wolfowiz, J. Opimum characer of he sequenial probabiliy raio es. Ann. Mah. Sa. 948, 9, Barron, A.; Sheu, C.H. Approximaion of densiy funcions by sequences of exponenial families. Ann. Sa. 99, 9, Wainwrigh, M.J.; Jordan, M.I. Graphical models, exponenial families, and variaional inference. Found. Trends Mach. Learn. 2008,, Beck, A.; Teboulle, M. Mirror descen and nonlinear projeced subgradien mehods for convex opimizaion. Oper. Res. Le. 2003, 3, Nemirovskii, A.; Yudin, D.; Dawson, E. Problem Complexiy and Mehod Efficiency in Opimizaion; Wiley: Hoboken, NJ, USA, Shalev-Shwarz, S. Online learning and online convex opimizaion. Found. Trends R Mach. Learn. 202, 4, The Implemenaion of he Code. Available online: hp://www2.isye.gaech.edu/~yxie77/one-sampleupdae-code.zip (accessed on 6 February 208). 43. Agarwal, A.; Duchi, J.C. Sochasic Opimizaion wih Non-i.i.d. Noise. 20. Available online: hp://op.kyb.uebingen.mpg.de/papers/op20_agarwal.pdf (accessed on 6 February 208). 44. Alqanoo, I.M. On he Truncaed Disribuions wihin he Exponenial Family; Deparmen of Applied Saisics, Al-Azhar Universiy Gaza: Gaza, Gaza Srip, 204.

25 Enropy 208, 20, of Xie, Y.; Siegmund, D. Sequenial muli-sensor change-poin deecion. Ann. Sa. 203, 4, Siegmund, D.; Yakir, B.; Zhang, N.R. Deecing simulaneous varian inervals in aligned sequences. Ann. Appl. Sa. 20, 5, Lipser, R.; Shiryayev, A. Theory of Maringales; Springer: Dordrech, The Neherlands, Duchi, J.; Shalev-Shwarz, S.; Singer, Y.; Chandra, T. Efficien projecions ono he `-ball for learning in high dimensions. In Proceedings of he Inernaional Conference on Machine learning (ICML), Helsinki, Finland, 5 9 June 2008; ACM: New York, NY, USA, 2008; pp Wang, Y.; Mei, Y. Large-scale muli-sream quickes change deecion via shrinkage pos-change esimaion. IEEE Trans. Inf. Theory 205, 6, c 208 by he auhors. Licensee MDPI, Basel, Swizerland. This aricle is an open access aricle disribued under he erms and condiions of he Creaive Commons Aribuion (CC BY) license (hp://creaivecommons.org/licenses/by/4.0/).

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS *

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS * haper 5 HERNOFF DISTANE AND AFFINITY FOR TRUNATED DISTRIBUTIONS * 5. Inroducion In he case of disribuions ha saisfy he regulariy condiions, he ramer- Rao inequaliy holds and he maximum likelihood esimaor

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Lecture 2 April 04, 2018

Lecture 2 April 04, 2018 Sas 300C: Theory of Saisics Spring 208 Lecure 2 April 04, 208 Prof. Emmanuel Candes Scribe: Paulo Orensein; edied by Sephen Baes, XY Han Ouline Agenda: Global esing. Needle in a Haysack Problem 2. Threshold

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests Ouline Ouline Hypohesis Tes wihin he Maximum Likelihood Framework There are hree main frequenis approaches o inference wihin he Maximum Likelihood framework: he Wald es, he Likelihood Raio es and he Lagrange

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models Journal of Saisical and Economeric Mehods, vol.1, no.2, 2012, 65-70 ISSN: 2241-0384 (prin), 2241-0376 (online) Scienpress Ld, 2012 A Specificaion Tes for Linear Dynamic Sochasic General Equilibrium Models

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

5.1 - Logarithms and Their Properties

5.1 - Logarithms and Their Properties Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

4.1 Other Interpretations of Ridge Regression

4.1 Other Interpretations of Ridge Regression CHAPTER 4 FURTHER RIDGE THEORY 4. Oher Inerpreaions of Ridge Regression In his secion we will presen hree inerpreaions for he use of ridge regression. The firs one is analogous o Hoerl and Kennard reasoning

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

t 2 B F x,t n dsdt t u x,t dxdt

t 2 B F x,t n dsdt t u x,t dxdt Evoluion Equaions For 0, fixed, le U U0, where U denoes a bounded open se in R n.suppose ha U is filled wih a maerial in which a conaminan is being ranspored by various means including diffusion and convecion.

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data Chaper 2 Models, Censoring, and Likelihood for Failure-Time Daa William Q. Meeker and Luis A. Escobar Iowa Sae Universiy and Louisiana Sae Universiy Copyrigh 1998-2008 W. Q. Meeker and L. A. Escobar. Based

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl Time series model fiing via Kalman smoohing and EM esimaion in TimeModels.jl Gord Sephen Las updaed: January 206 Conens Inroducion 2. Moivaion and Acknowledgemens....................... 2.2 Noaion......................................

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

Appendix to Creating Work Breaks From Available Idleness

Appendix to Creating Work Breaks From Available Idleness Appendix o Creaing Work Breaks From Available Idleness Xu Sun and Ward Whi Deparmen of Indusrial Engineering and Operaions Research, Columbia Universiy, New York, NY, 127; {xs2235,ww24}@columbia.edu Sepember

More information

GINI MEAN DIFFERENCE AND EWMA CHARTS. Muhammad Riaz, Department of Statistics, Quaid-e-Azam University Islamabad,

GINI MEAN DIFFERENCE AND EWMA CHARTS. Muhammad Riaz, Department of Statistics, Quaid-e-Azam University Islamabad, GINI MEAN DIFFEENCE AND EWMA CHATS Muhammad iaz, Deparmen of Saisics, Quaid-e-Azam Universiy Islamabad, Pakisan. E-Mail: riaz76qau@yahoo.com Saddam Akbar Abbasi, Deparmen of Saisics, Quaid-e-Azam Universiy

More information

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu ON EQUATIONS WITH SETS AS UNKNOWNS BY PAUL ERDŐS AND S. ULAM DEPARTMENT OF MATHEMATICS, UNIVERSITY OF COLORADO, BOULDER Communicaed May 27, 1968 We shall presen here a number of resuls in se heory concerning

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

DEPARTMENT OF STATISTICS

DEPARTMENT OF STATISTICS A Tes for Mulivariae ARCH Effecs R. Sco Hacker and Abdulnasser Haemi-J 004: DEPARTMENT OF STATISTICS S-0 07 LUND SWEDEN A Tes for Mulivariae ARCH Effecs R. Sco Hacker Jönköping Inernaional Business School

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

13.3 Term structure models

13.3 Term structure models 13.3 Term srucure models 13.3.1 Expecaions hypohesis model - Simples "model" a) shor rae b) expecaions o ge oher prices Resul: y () = 1 h +1 δ = φ( δ)+ε +1 f () = E (y +1) (1) =δ + φ( δ) f (3) = E (y +)

More information

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19 Sequenial Imporance Sampling (SIS) AKA Paricle Filering, Sequenial Impuaion (Kong, Liu, Wong, 994) For many problems, sampling direcly from he arge disribuion is difficul or impossible. One reason possible

More information

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H. ACE 564 Spring 2006 Lecure 7 Exensions of The Muliple Regression Model: Dumm Independen Variables b Professor Sco H. Irwin Readings: Griffihs, Hill and Judge. "Dumm Variables and Varing Coefficien Models

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

CHAPTER 2 Signals And Spectra

CHAPTER 2 Signals And Spectra CHAPER Signals And Specra Properies of Signals and Noise In communicaion sysems he received waveform is usually caegorized ino he desired par conaining he informaion, and he undesired par. he desired par

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

EKF SLAM vs. FastSLAM A Comparison

EKF SLAM vs. FastSLAM A Comparison vs. A Comparison Michael Calonder, Compuer Vision Lab Swiss Federal Insiue of Technology, Lausanne EPFL) michael.calonder@epfl.ch The wo algorihms are described wih a planar robo applicaion in mind. Generalizaion

More information

Empirical Process Theory

Empirical Process Theory Empirical Process heory 4.384 ime Series Analysis, Fall 27 Reciaion by Paul Schrimpf Supplemenary o lecures given by Anna Mikusheva Ocober 7, 28 Reciaion 7 Empirical Process heory Le x be a real-valued

More information

Games Against Nature

Games Against Nature Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed

More information