Article Sequential Change-Point Detection via Online Convex Optimization. and Huan Xu

Size: px

Start display at page:

Download "Article Sequential Change-Point Detection via Online Convex Optimization. and Huan Xu"

Dwight Goodwin
5 years ago
Views:

enropy Aricle Sequenial Change-Poin Deecion via Online Convex Opimizaion Yang Cao, Liyan Xie, Yao Xie * ID and Huan Xu H.

1 enropy Aricle Sequenial Change-Poin Deecion via Online Convex Opimizaion Yang Cao, Liyan Xie, Yao Xie * ID and Huan Xu H. Milon Sewar School of Indusrial and Sysems Engineering, Georgia Insiue of Technology, Alana, GA 30332, USA; caoyang@gaech.edu (Y.C.); lxie49@gaech.edu (L.X.); huan.xu@isye.gaech.edu (H.X.) * Correspondence: yao.xie@isye.gaech.edu Received: Sepember 207; Acceped: 5 February 208; Published: 7 February 208 Absrac: Sequenial change-poin deecion when he disribuion parameers are unknown is a fundamenal problem in saisics and machine learning. When he pos-change parameers are unknown, we consider a se of deecion procedures based on sequenial likelihood raios wih non-anicipaing esimaors consruced using online convex opimizaion algorihms such as online mirror descen, which provides a more versaile approach o ackling complex siuaions where recursive maximum likelihood esimaors canno be found. When he underlying disribuions belong o a exponenial family and he esimaors saisfy he logarihm regre propery, we show ha his approach is nearly second-order asympoically opimal. This means ha he upper bound for he false alarm rae of he algorihm (measured by he average-run-lengh) mees he lower bound asympoically up o a log-log facor when he hreshold ends o infiniy. Our proof is achieved by making a connecion beween sequenial change-poin and online convex opimizaion and leveraging he logarihmic regre bound propery of online mirror descen algorihm. Numerical and real daa examples validae our heory. Keywords: sequenial mehods; change-poin deecion; online algorihms. Inroducion Sequenial analysis is a classic opic in saisics concerning online inference from a sequence of observaions. The goal is o make saisical inference as quickly as possible, while conrolling he false-alarm rae. An imporan sequenial analysis problem commonly sudied is sequenial change-poin deecion []. I arises from various applicaions including online anomaly deecion, saisical qualiy conrol, biosurveillance, financial arbirage deecion and nework securiy monioring (see, e.g., [2 4]). We are ineresed in he sequenial change-poin deecion problem wih known pre-change parameers bu unknown pos-change parameers. Specifically, given a sequence of samples X, X 2,..., we assume ha hey are independen and idenically disribued (i.i.d.) wih cerain disribuion f q parameerized by q, and he values of q are differen before and afer some unknown ime called he change-poin. We furher assume ha he parameers before he change-poin are known. This is reasonable since usually i is relaively easy o obain he reference daa for he normal sae, so ha he parameers in he normal sae can be esimaed wih good accuracy. Afer he change-poin, however, he values of he parameers swich o some unknown values, which represen anomalies or novelies ha need o be discovered... Moivaion: Dilemma of CUSUM and Generalized Likelihood Raio (GLR) Saisics Consider change-poin deecion wih unknown pos-change parameers. A commonly used change-poin deecion mehod is he so-called CUSUM procedure [4] ha can be derived from Enropy 208, 20, 08; doi:0.3390/e

2 Enropy 208, 20, 08 2 of 25 likelihood raios. Assume ha before he change, he samples X i follow a disribuion f q0 and afer he change he samples X i follow anoher disribuion f q. CUSUM procedure has a recursive srucure: iniialized wih W 0 = 0, he likelihood-raio saisic can be compued according o W + = max{w + log( f q (X + )/ f q0 (X + )),0}, and a change-poin is deeced whenever W exceeds a pre-specified hreshold. Due o he recursive srucure, CUSUM is memory and compuaion efficien since i does no need o sore he hisorical daa and only needs o record he value of W. The performance of CUSUM depends on he choice of he pos-change parameer q ; in paricular, here mus be a well-defined noion of disance beween q 0 and q. However, he choice of q is somewha subjecive. Even if in pracice a reasonable choice of q is he smalles change-of-ineres, in he muli-dimensional seing, i is hard o define wha he smalles change would mean. Moreover, when he assumed parameer q deviaes significanly from he rue parameer value, CUSUM may suffer a severe performance degradaion [5]. An alernaive approach is he Generalized Likelihood Raio (GLR) saisic based procedure [6]. The GLR saisic finds he maximum likelihood esimae (MLE) of he pos-change parameer and plugs i back ino he likelihood raio o form he deecion saisic. To be more precise, for each hypoheical change-poin locaion k, he corresponding pos-change samples are {X k+,..., X }. Using hese samples, one can form he MLE denoed as ˆq k+,. Wihou knowing wheher he change occurs and where i occurs beforehand when forming he GLR saisic, we have o maximize k over all possible change locaions. The GLR saisic is given by max k< i=k+ log( f ˆq (X k, i )/ f q0 (X )), and a change is announced whenever i exceeds a pre-specified hreshold. The GLR saisic is more robus han CUSUM [7], and i is paricularly useful when he pos-change parameer may vary from one siuaion o anoher. In simple cases, he MLE ˆq k+, may have closed-form expressions and may be evaluaed recursively. For insance, when he pos-change disribuion is Gaussian wih mean q [8], ˆq k+, =(i=k+ X i)/( k), and ˆq k+,+ =( k)/( k + ) ˆq k+, + X + /( k + ). However, in more complex siuaions, in general MLE ˆq k+, does no have recursive form and canno be evaluaed using simple summary saisics. One such insance is given in Secion.2. Anoher insance is when here is a consrain on he MLE such as sparsiy. In hese cases, one has o sore hisorical daa and recompue he MLE ˆq k, whenever here is new daa, which is no memory efficien nor compuaional efficien. For hese cases, as a remedy, he window-limied GLR is usually considered, where only he pas w samples are sored and he maximizaion is resriced o be over k 2 ( w, ]. However, even wih he window-limied GLR, one sill has o recompue ˆq k, using hisorical daa whenever he new daa are added. Besides CUSUM or GLR, various online change-poin deecion procedures using one-sample updaes have been considered, which replace wih he MLE wih a simple recursive esimaor. The one-sample updae esimae akes he form of ˆq k, = h(x, ˆq k, ) for some funcion h ha uses only he mos recen daa and he previous esimae. Then he esimaes are plugged ino he likelihood raio saisic o perform deecion. Online convex opimizaion algorihms (such as online mirror descen) are naural approach o consruc hese esimaors (see, e.g., [9,0]). Such a scheme provides a more versaile approach o developing a deecing procedure for complex siuaions, where he exac MLE does no have a recursive form or even a closed-form expression. The one-sample updae enjoys efficien compuaion, as informaion from he new daa can be incorporaed via low compuaional cos updae. I is also memory efficien since he updae only needs he mos recen sample. The one sample updae esimaors may no correspond o he exac MLE, bu hey end o resul in good deecion performance. However, in general here is no performance guaranees for such an approach. This is he quesion we aim o address in his paper..2. Applicaion Scenario: Social Nework Change-Poin Deecion The widespread use of social neworks (such as Twier) leads o a large amoun of user-generaed daa generaed coninuously. One imporan aspec is o deec change-poins in sreaming social nework daa. These change-poins may represen he collecive anicipaion of response o exernal

3 Enropy 208, 20, 08 3 of 25 evens or sysem shocks []. Deecing such changes can provide a beer undersanding of paerns of social life. In social neworks, a common form of he daa is discree evens over coninuous ime. As a simplificaion, each even conains a ime label and a user label in he nework. In our prior work [2], we model discree evens using nework poin processes, which capure he influence beween users hrough an influence marix. We hen cas he problem as deecing changes in an influence marix, assuming ha he influence marix in he normal sae (before he change) can be esimaed from he reference daa. Afer he change, he influence marix is unknown (since i represens an anomaly) and has o be esimaed online. Due o compuaional burden and memory consrain, since he scale of he nework ends o be large, we do no wan o sore he enire hisorical daa and raher compue he saisic in real-ime. A simulaed example o illusrae his case is shown laer in Secion Conribuions This paper has wo main conribuions. Firs, we presen a general approach based on online convex opimizaion (OCO) for consrucing he esimaor for he one-sided sequenial hypohesis es and he sequenial change-poin deecion, in he non-anicipaive approach of [8] if he MLE canno be compued in a convenien recursive form. Second, we provide a proof of he near second-order asympoic opimaliy of his approach when a logarihmic regre propery is saisfied and when he disribuions are from an exponenial family. The nearly second-order asympoic opimaliy [4] means ha he upper bound for performance maches he lower bound up o a log-log facor as he false-alarm rae ends o zero. Inspired by he exising connecion beween sequenial analysis and online convex opimizaion in [3,4], we prove he near opimaliy leveraging he logarihmic regre propery of online mirror descen (OMD) and he lower bound esablished in saisical sequenial change-poin lieraure [4,5]. More precisely, we provide a general upper bound for he one-sided sequenial hypohesis es and change-poin deecion procedures wih he one-sample updae schemes. The upper bound explicily capures he impac of esimaion on deecion by an esimaion algorihm dependen facor. This facor shows up as an addiional erm in he upper bound for he expeced deecion delay, and i corresponds o he regre incurred by he one-sample updae esimaors. This esablishes an ineresing linkage beween sequenial change-poin deecion and online convex opimizaion. Alhough boh fields, sequenial change-poin deecion and online convex opimizaion, sudy sequenial daa, he precise connecion beween hem is no clear, parly because he performance merics are differen: he former is concerned wih he radeoff beween average run lengh and deecion delay, whereas he laer focuses on bounding he cumulaive loss incurred by he sequence of esimaors hrough a regre bound [4,6]. Synheic examples validae he performances of one sample updae schemes. Here we focus on OMD esimaors, bu he resuls can be generalized o oher OCO schemes such as he online gradien descen..4. Lieraure and Relaed Work Sequenial change-poin deecion is a classic subjec wih an exensive lieraure. Much success has been achieved when he pre-change and pos-change disribuions are exacly specified. For example, he CUSUM procedure [7] wih firs-order asympoic opimaliy [8] and exac opimaliy [9] in he minimax sense, and he Shiryayev-Robers (SR) procedure [20] derived based on Bayesian principle ha also enjoys various opimaliy properies. Boh CUSUM and SR procedures rely on likelihood raios beween he specified pre-change and pos-change disribuions. There are wo main approaches in dealing wih he unknown pos-change parameers. The firs one is a GLR approach [7,2 24], and he second is a mixure approach [5,25]. The GLR saisic enjoys cerain opimaliy properies, bu i can no be compued recursively in many cases [23]. To address he infinie memory issue, [7,2] sudied he window-limied GLR procedure. The main advanage of he mixure approach is ha i allows an easy evaluaion of a hreshold ha guaranees he desired

4 Enropy 208, 20, 08 4 of 25 false alarm consrain. A disadvanage of his approach is ha someimes here may no be a naural way of selecing he weigh funcion, in paricular when here is no conjugae prior. This moivaed a hird approach o his problem, which was proposed firs by Robbins and Siegmund in he conex of hypohesis esing, and hen Lorden and Pollak [8] in he sequenial change deecion problem. This approach replaces he unknown parameer wih some non-anicipaing esimaor, which can be easier o find even if here is no conjugae prior, as in he Gamma example considered in [8,25]. This work developed a modified SR procedure by inroducing a prior disribuion o he unknown parameers. While he non-anicipaing esimaor approach [8,24] enjoys recursive and hus efficien compuaion for he likelihood raio based deecion saisics, heir approaches o consrucing recursive esimaors (based on MLE or mehod-of-momens) canno be easily exended o more complex cases (for insance, muli-dimensional parameers wih consrains). Here, we consider a general and convenien approach for consrucing non-anicipaing esimaors based on online convex opimizaion which is paricularly useful for hese complex cases. Our work provides an alernaive proof for he nearly second-order asympoic opimaliy by building a connecion o online convex opimizaion and leveraging he regre bound ype of resuls [4]. For one-dimensional Gaussian mean shif wihou any consrain, we replicae he second-order asympoic opimaliy, namely, Theorem 3.3 in [24]. Recen work [26] also reas he problem when he pre-change disribuion has unknown parameers. Anoher relaed problem is sequenial join esimaion and deecion, bu he goal is differen in ha one aims o achieve boh good deecion and good esimaion performance, whereas in our seing, esimaion is only needed for compuing he deecion saisics. These works include [27] and [28], which sudy he join deecion and esimaion problem of a specific form ha arises from many applicaions such as specrum sensing [29], image observaions [30], and MIMO radar [3]: a linear scalar observaion model wih Gaussian noise, and under he alernaive hypohesis here is an unknown muliplicaive parameer. The paper of [27] demonsraes ha solving he join problem by reaing deecion and esimaion separaely wih he corresponding opimal procedure does no yield an overall opimum performance, and provides an elegan closed-form opimal deecor. Laer on [28] generalizes he resuls. There are also oher approaches solving he join deecion-esimaion problem using muliple hypoheses esing [30,32] and Bayesian formulaions [33]. Relaed work using online convex opimizaion for anomaly deecion includes [9], which develops an efficien deecor for he exponenial family using online mirror descen and proves a logarihmic regre bound, and [0], which dynamically adjuss he deecion hreshold o allow feedbacks abou wheher decision oucome. However, hese works consider a differen seing ha he change is a ransien oulier insead of a persisen change, as assumed by he classic saisical change-poin deecion lieraure. When here is persisen change, i is imporan o accumulae evidence by pooling he pos-change samples (our work considers he persisen change). Exensive work has been done for parameer esimaion in he online-seing. This includes online densiy esimaion over he exponenial family by regre minimizaion [9,0,6], sequenial predicion of individual sequence wih he logarihm loss [3,34], online predicion for ime series [35], and sequenial NML (SNML) predicion [34] which achieves he opimal regre bound. Our problem is differen from he above, in ha esimaion is no he end goal; one only performs parameer esimaion o plug hem back ino he likelihood funcion for deecion. Moreover, a suble bu imporan difference of our work is ha he loss funcion for online deecing esimaion is f ˆq (X i i ), whereas our loss funcion is f ˆq (X i i ) in order o reain he maringale propery, which is essenial o esablish he nearly second-order asympoic opimaliy. 2. Preliminaries Assume a sequence of i.i.d. random variables X, X 2,...wih a probabiliy densiy funcion of a parameric form f q. The parameer q may be unknown. Consider wo relaed problems: one-sided sequenial hypohesis es and sequenial change-poin deecion. The deecion saisic relies on a sequence esimaor { ˆq } consruced using online mirror descen. The OMD uses simple one-sample

5 Enropy 208, 20, 08 5 of 25 updae: he updae from ˆq o ˆq only uses he curren sample X. This is he main difference from he radiional generalized likelihood raio (GLR) saisic [7], where each ˆq is esimaed using hisorical samples. In he following, we presen deailed descripions for wo problems. We will consider exponenial family disribuions and presen our non-anicipaing esimaor based on he one-sample esimae. 2.. One-Sided Sequenial Hypohesis Tes Firs, we consider a one-sided sequenial hypohesis es where he goal is only o rejec he null hypohesis. This is a special case of he change-deecion problem where he change-poin can be eiher 0 or (meaning i never occurs). Sudying his special case will given us an imporan inermediae sep owards solving he sequenial change-deecion problem. Consider he null hypohesis H 0 : q = q 0 versus he alernaive H : q 6= q 0. Hence, he parameer under he alernaive disribuion is unknown. The classic approach o solve his problem is he one-sided sequenial probabliy-raio es (SPRT) [36]: a each ime, given samples {X, X 2,..., X }, he decision is eiher o rejec H 0 or aking more samples if he rejecion decision can no be made confidenly. Here, we inroduce a modified one-sided SPRT wih a sequence of non-anicipaing plug-in esimaors: ˆq := ˆq (X,...,X ), =, 2,.... () Define he es saisic a ime as L = The es saisic has a simple recursive implemenaion: f ˆq (X i i ), i. (2) f q0 (X i ) L = L f ˆq (X ) f q0 (X. Define a sequence of s-algebras {F } where F = s(x,..., X ). The es saisic has he maringale propery due o is non-anicipaing naure: E[L F ]=L, where he expecaion is aken when X,... are i.i.d. random variables drawn from f q0. The decision rule is a sopping ime (b) =min{ : log L b}, (3) where b > 0 is a pre-specified hreshold. We rejec he null hypohesis whenever he saisic exceeds he hreshold. The goal is o rejec he null hypohesis using as few samples as possible under he false-alarm rae (or Type-I error) consrain Sequenial Change-Poin Deecion Now we consider he sequenial change-poin deecion problem. A change may occur a an unknown ime n which alers he underlying disribuion of he daa. One would like o deec such a change as quickly as possible. Formally, change-poin deecion can be cas ino he following hypohesis es: H 0 : X, X 2,... i.i.d. f q0, H : X,...,X n i.i.d. f q0, X n+, X n+2,... i.i.d. f q, (4) Here we assume an unknown q o represen he anomaly. The goal is o deec he change as quickly as possible afer i occurs under he false-alarm rae consrain. We will consider likelihood raio based

6 Enropy 208, 20, 08 6 of 25 deecion procedures adaped from wo ypes of exising ones, which we call he adapive CUSUM (ACM), and he adapive SRRS (ASR) procedures. For change-poin deecion, he pos-change parameer is esimaed using pos-change samples. This means ha, for each puaive change-poin locaion before he curren ime k <, he pos-change samples are {X k,...,x }; wih a sligh abuse of noaion, he pos-change parameer is esimaed as ˆq k,i = ˆq k,i (X k,...,x i ), i k. (5) Therefore, for k =, ˆq k,i becomes ˆq i defined in (2) for he one-sided SPRT. Iniialize wih ˆq k,k = q 0. The likelihood raio a ime for a hypoheical change-poin locaion k is given by L k, = f ˆq (X k,i i ) f i=k q0 (X i ), (6) where L k, can be compued recursively similar o (2). Since we do no know he change-poin locaion n, from he maximum likelihood principle, we ake he maximum of he saisics over all possible values of k. This gives he ACM procedure: T ACM (b )=inf : max applekapple log L k, > b, (7) where b is a pre-specified hreshold. Similarly, by replacing he maximizaion over k in (7) wih summaion, we obain he following ASR procedure [8], which can be inerpreed as a Bayesian saisic similar o he Shiryaev-Robers procedure. T ASR (b 2 )=inf ( : log! ) L k, > b 2, (8) k= where b 2 is a pre-specified hreshold. The compuaions of L k, and esimaor { ˆq }, { ˆq k, } are discussed laer in Secion 2.4. For a fixed k, he comparison beween our mehods and GLR is illusraed in Figure. Remark. In pracice, o preven he memory and compuaion complexiy from blowing up as ime goes o infiniy, we can use window-limied version of he deecion procedures in (7) and (8). The window-limied versions are obained by replacing max applekapple wih max wapplekapple in (7) and by replacing k= wih k= w in (8). Here w is a prescribed window size. Even if we do no provide heoreical analysis o he window-limied versions, we refer he readers o [7] for he choice of w he window-limied GLR procedures. GLR! ",,! %! %&' One-sample updae () ",%! %&' Compue MLE: () ",%&' GLR: Λ ",%&' =. /0 (6 7 ) %&',234 :;". / 9 (6 7) () ",%&' ACM: Λ ",%&' = Λ ",%. / 0,2 (6 234 ). / 9 (6 234) Figure. Comparison of he updae scheme for GLR and our mehods when a new sample arrives.

7 Enropy 208, 20, 08 7 of Exponenial Family In his paper, we focus on f q being he exponenial family for he following reasons: (i) exponenial family [0] represens a very rich class of parameric and even many nonparameric saisical models [37]; (ii) he negaive log-likelihood funcion for exponenial family log f q (x) is convex, and his allows us o perform online convex opimizaion. Some useful properies of he exponenial family are briefly summarized below, and full proofs can be found in [0,38]. Consider an observaion space X equipped wih a sigma algebra B and a sigma finie measure H on (X, B). Assume he number of parameers is d. Le x denoe he ranspose of a vecor or marix. Le f : X! R d be an H-measurable funcion f(x) =(f (x),..., f d (x)). Here f(x) corresponds o he sufficien saisic for q. Le Q denoe he parameer space in R d. Le {P q, q 2 Q} be a se of probabiliy disribuions wih respec o he measure H. Then, {P q, q 2 Q} is said o be a mulivariae exponenial family wih naural parameer q, if he probabiliy densiy funcion of each f q 2P q wih respec o H can be expressed as f q (x) =exp{q f(x) F(q)}. In he definiion, he so-called log-pariion funcion is given by Z F(q) := log X exp(q f(x))dh(x). To make sure f q (x) a well-defined probabiliy densiy, we consider he following wo ses for parameers: Z Q = {q 2 R d : log X exp(q f(x))dh(x) < + }, and Q s = {q 2 Q : r 2 F(q) si d d }. Noe ha log f q (x) is s-srongly convex over Q s. Is gradien corresponds o rf(q) =E q [f(x)], and he Hessian r 2 F(q) corresponds o he covariance marix of he vecor f(x). Therefore, r 2 F(q) is posiive semidefinie and F(q) is convex. Moreover, F is a Legendre funcion, which means ha i is srongly convex, coninuous differeniable and essenially smooh [38]. The Legendre-Fenchel dual F is defined as F (z) =sup{u z F(u)}. u2q The mappings rf is an inverse mapping of rf [39]. Moreover, if F is a srongly convex funcion, hen rf =(rf). A general measure of proximiy used in he OMD is he so-called Bregman divergence B F, which is a nonnegaive funcion induced by a Legendre funcion F (see, e.g., [0,38]) defined as B F (u, v) := F(u) F(v) hrf(v), u vi. (9) For exponenial family, a naural choice of he Bregman divergence is he Kullback-Leibler (KL) divergence. Define E q as he expecaion when X is a random variable wih densiy f q and I(q, q 2 ) as he KL divergence beween wo disribuions wih densiies f q and f q2 for any q, q 2 2 Q. Then I(q, q 2 )=E q log( fq (X)/ f q2 (X)). (0) I can be shown ha, for exponenial family, I(q, q 2 )=F(q 2 ) F(q ) (q 2 q ) rf(q ). Using he definiion (9), his means ha B F B F (q, q 2 ) := I(q 2, q ) () is a Bregman divergence. This propery is useful o consrucing mirror descen esimaor for he exponenial family [39,40].

8 Enropy 208, 20, 08 8 of Online Convex Opimizaion (OCO) Algorihms for Non-Anicipaing Esimaors Online convex opimizaion (OCO) algorihms [4] can be inerpreed as a player who makes sequenial decisions. A he ime of each decision, he oucomes are unknown o he player. Afer commiing o a decision, he decision maker suffers a loss ha can be adversarially chosen. An OCO algorihm makes decisions, which, based on he observed oucomes, minimizes he regre ha is he difference beween he oal loss ha has incurred relaive o ha of he bes fixed decision in hindsigh. To design non-anicipaing esimaors, we consider OCO algorihms wih likelihood-based regre funcions. We ieraively esimae he parameers a he ime when one new observaion becomes available based on he maximum likelihood principle, and hence he loss incurred corresponds o he negaive log-likelihood of he new sample evaluaed a he esimaor `(q) := log f q (X ), which corresponds o he log-loss in [3]. Given samples X,..., X, he regre for a sequence of esimaors { ˆq i } generaed by a likelihood-based OCO algorihm a is defined as R a = { log f ˆq i (X i )} inf q2q { log f q (X i)}. (2) Below we omi he superscrip a occasionally for noaional simpliciy. In his paper, we consider a generic OCO procedure called he online mirror descen algorihms (OMD) [4,4]. Nex, we discuss how o consruc he non-anicipaing esimaors { ˆq } in (), and { ˆq k, }, k =, 2,..., in (5) using OMD. The main idea of OMD is he following. A each ime sep, he esimaor ˆq is updaed using he new sample X, by balancing he endency o say close o he previous esimae agains he endency o move in he direcion of he greaes local decrease of he loss funcion. For he loss funcion defined above, a sequence of OMD esimaor is consruced by ˆq = arg min[u r`( ˆq )+ B F (u, ˆq u2g h )], (3) i where B F is defined in (). Here G Q s is a closed convex se, which is problem-specific and encourages cerain parameer srucure such as sparsiy. Remark 2. Similar o (3), for any fixed k, we can compue { ˆq k, } via OMD for sequenial change-poin deecion. The only difference is ha { ˆq k, } is compued if we use X k as our firs sample and hen apply he recursive updae (3) on X k+,... For ˆq, we use X as our firs sample. There is an equivalen form of OMD, presened as he original formulaion in [40]. The equivalen form is someimes easier o use for algorihm developmen, and i consiss of four seps: () compue he dual variable: ˆµ = rf( ˆq ); (2) perform he dual updae: ˆµ = ˆµ h r`( ˆq ); (3) compue he primal variable: q = (rf) ( ˆµ ); (4) perform he projeced primal updae: ˆq = arg min u2g B F (u, q ). The equivalence beween he above form for OMD and he nonlinear projeced subgradien approach in (3) is proved in [39]. We adop his approach when deriving our algorihm and follow he same sraegy as [9]. Algorihm summarizes he seps [42]. For srongly convex loss funcion, he regre of many OCO algorihms, including he OMD, has he propery ha R n apple C log n for some consan C (depend on f q and Q s ) and any posiive ineger n [0,43]. Noe ha for exponenial family, he loss funcion is he negaive log-likelihood funcion, which is srongly convex over Q s. Hence, we can have he logarihmic regre propery.

9 Enropy 208, 20, 08 9 of 25 Algorihm Online mirror-descen for non-anicipaing esimaors. Require: Exponenial family specificaions f(x), F(x) and f q (x); iniial parameer value q 0 ; sequence of daa X,..., X,...; a closed, convex se for parameer G Q s ; a decreasing sequence {h } of sricly posiive sep-sizes. : ˆq 0 = q 0, L 0 =. {Iniializaion} 2: for all =, 2,..., do 3: Acquire a new observaion X 4: Compue loss `( ˆq ), log f ˆq (X )=F( ˆq ) ˆq f(x ) 5: Compue likelihood raio L = L f ˆq (X )/ f q0 (X ) 6: ˆµ = rf( ˆq ), ˆµ = ˆµ h ( ˆµ f(x )) {Dual updae} 7: q =(rf) ( ˆµ ) 8: ˆq = arg min u2g B F (u, q ) {Projeced primal updae} 9: end for 0: reurn { ˆq } and {L }. 3. Nearly Second-Order Asympoic Opimaliy of One-Sample Updae Schemes Below we prove he nearly second-order asympoic opimaliy of he one-sample updae schemes. More precisely, he nearly second-order asympoic opimaliy means ha he algorihm obains he lower performance bound asympoically up o a log-log facor in he false-alarm rae, as he false-alarm rae ends o zero (in many cases he log-log facor is a small number). We firs inroduce some necessary noaions. Denoe P q,n and E q,n as he probabiliy measure and he expecaion when he change occurs a ime n and he pos-change parameer is q, i.e., when X,..., X n are i.i.d. random variables wih densiy f q0 and X n+, X n+2,...are i.i.d. random variables wih densiy f q. Moreover, le P and E denoe he probabiliy measure when here is no change, i.e., X, X 2,...are i.i.d. random variables wih densiy f q0. Finally, le F denoe he s-algebra generaed by X,...,X for. 3.. One-Sided Sequenial Hypohesis Tes Recall ha he decision rule for sequenial hypohesis es is a sopping ime (b) defined in (3). The wo sandard performance merics are he false-alarm rae, denoed as P ((b) < ), and he expeced deecion delay (i.e., he expeced number of samples needed o rejec he null), denoed as E q,0 [(b)]. A meaningful es should have boh small P ((b) < ) and small E q,0 [(b)]. Usually, one adjuss he hreshold b o conrol he false-alarm rae o be below a cerain level. Our main resul is he following. As has been observed by [23], here is a loss in he saisical efficiency by using one-sample updae esimaors relaive o he GLR approach using he enire samples X,..., X in he pas. The heorem below shows ha his loss corresponds o he expeced regre given in (2). Theorem (Upper bound for OCO based SPRT). Le { ˆq } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. As b!, h i b E q,0 [(b)] apple I(q, q 0 ) + E q,0 R a (b) + O() (4) I(q, q 0 ) Here O() is a erm upper-bounded by an absolue consan as b!. The main idea of he proof is o decompose he saisic defining (b), log L(), ino a few erms ha form maringales, and hen invoke he Wald s Theorem for he sopped process.

10 Enropy 208, 20, 08 0 of 25 Remark 3. The inequaliy (4) is valid for a sequence of non-anicipaing esimaors generaed by an OCO algorihm. Moreover, (4) gives an explici connecion beween he expeced deecion delay for he one-sided sequenial hypohesis esing (lef-hand side of (4)) and he regre for he OCO (he second erm on he righ-hand side of (4)). This illusraes clearly he impac of esimaion on deecion by an esimaion algorihm dependen facor. Noe ha in he saemen of he Theorem, he sopping ime (b) appears on he righ-hand side of he inequaliy (4). For OMD, he expeced sample size is usually small. By comparing wih specific regre bound R (b), we can bound E q,0 [(b)] as discussed in Secion 4. The mos imporan case is ha when he esimaion algorihm has a logarihmic expeced regre. For he exponenial family, as shown in Secion 3.3, Algorihm can achieve E q,0 [R n ] apple C log n for any posiive ineger n. To obain a more specific order of he upper bound for E q,0 [ b ] when b grows, we esablish an upper bound for E q,0 [ b ] as a funcion of b, o obain he following Corollary. Corollary. Le { ˆq } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Assume ha E q,0 [R a n] apple C log n for any posiive ineger n and some consan C > 0, we have E q,0 [(b)] apple b I(q, q 0 ) + C log b ( + o()). (5) I(q, q 0 ) Here o() is a vanishing erm as b!. Corollary shows ha oher han he well known firs-order approximaion b/i(q, q 0 ) [8,8], he expeced deecion delay E q,0 [(b)] is bounded by an addiional erm ha is on he order of log(b) if he esimaion algorihm has a logarihmic regre. This log b erm plays an imporan role in esablishing he opimaliy properies laer. To show he opimaliy properies for he deecion procedures, we firs selec a se of deecion procedures wih false-alarm raes lower han a prescribed value, and hen prove ha among all he procedures in he se, he expeced deecion delays of our proposed procedures are he smalles. Thus, we can choose a hreshold b o uniformly conrol he false-alarm rae of (b). Lemma (false-alarm rae of (b)). Le { ˆq } b > 0, P ((b) < ) apple exp( b). be any sequence of non-anicipaing esimaors. For any Lemma shows ha as b increases he false-alarm rae of (b) decays exponenially fas. We can se b = log(/a) o make he false-alarm rae of (b) less han some a > 0. Nex, leveraging an exising lower bound for general SPRT presened in Secion in [4], we esablish he nearly second-order asympoic opimaliy of OMD based SPRT as follows: Corollary 2 (Nearly second-order opimaliy of OCO based SPRT). Le { ˆq } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Assume ha E q,0 [R a n] apple C log n for any posiive ineger n and some consan C > 0. Define a se C(a) ={T : P (T < ) apple a}. For b = log(/a), due o Lemma, (b) 2 C(a). For such a choice, (b) is nearly second-order asympoic opimal in he sense ha for any q 2 Q s {q 0 }, as a! 0, E q,0 [(b)] inf E q,0[t] =O(log(log(/a))). (6) T2C(a) The resul means ha, compared wih any procedure (including he opimal procedure) calibraed o have a false-alarm rae less han a, our procedure incurs an a mos log(log(/a)) increase in he expeced deecion delay, which is usually a small number. For insance, even for a conservaive case when we se a = 0 5 o conrol he false-alarm rae, he number is log(log(/a)) = 2.44.

11 Enropy 208, 20, 08 of Sequenial Change-Poin Deecion Now we proceed he proof by leveraging he close connecion [8] beween he sequenial change-poin deecion and he one-sided hypohesis es. For sequenial change-poin deecion, he wo commonly used performance merics [4]areheaveragerunlengh(ARL),denoedbyE [T]; and he maximal condiional average delay o deecion (CADD), denoed by sup n 0 E q,n [T n T > n]. ARL is he expeced number of samples beween wo successive false alarms, and CADD is he expeced number of samples needed o deec he change afer i occurs. A good procedure should have a large ARL and a small CADD. Similar o he one-sided hypohesis es, one usually choose he hreshold large enough so ha ARL is larger han a pre-specified level. Similar o Theorem, we provide an upper bound for he CADD of our ASR and ACM procedures. Theorem 2. Consider he change-poin deecion procedure T ACM (b ) in (7) and T ASR (b 2 ) in (8). For any fixed k, le { ˆq k, } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Le b = b 2 = b, as b! we have ha sup E q,n [T ASR (b) n 0 i apple (I(q, q 0 )) b + E q,0 hr a (b) + O(). n T ASR (b) > n] apple sup E q,n [T ACM (b) n T ACM (b) > n] n 0 (7) To prove Theorem 2, we relae he ASR and ACM procedures o he one-sided hypohesis es and use he fac ha when he measure P is known, sup n 0 E q,n [T n T > n] is aained a n = 0 for boh he ASR and he ACM procedures. Above, we may apply a similar argumen as in Corollary o remove he dependence on (b) on he righ-hand-side of he inequaliy. We esablish he following lower bound for he ARL of he deecion procedures, which is needed for proving Corollary 3: Lemma 2 (ARL). Consider he change-poin deecion procedure T ACM (b ) in (7) and T ASR (b 2 ) in (8). For any fixed k, le { ˆq k, } be any sequence of non-anicipaing esimaors. Le b = b 2 = b, given a prescribed lower bound g > 0 for he ARL, we have provided ha b log g. E [T ACM (b)] E [T ASR (b)] g, Lemma 2 shows ha given a required lower bound g for ARL, we can choose b = log g o make he ARL be greaer han g. This is consisen wih earlier works [8,25] which show ha he smalles hreshold b such ha E [T ACM (b)] g is approximae log g. However, he bound in Lamma 2 is no igh, since in pracice we can se b = r log g for some r 2 (0, ) o ensure ha ARL is greaer han g. Combing he upper bound in Theorem 2 wih an exising lower bound for he CADD of SRRS procedure in [5], we obain he following opimaliy properies. Corollary 3 (Nearly second-order asympoic opimaliy of ACM and ASR). Consider he change-poin deecion procedure T ACM (b ) in (7) and T ASR (b 2 ) in (8). For any fixed k, le { ˆq k, } be a sequence of non-anicipaing esimaors generaed by an OCO algorihm a. Assume ha E q,0 [R a n] apple C log n for any posiive ineger n and some consan C > 0. Le b = b 2 = b. Define S(g) ={T : E [T] g}. For b = log g, due o Lemma 2, boh T ASR (b) and T ACM (b) belong o S(g). For such b, boh T ASR (b) and T ACM (b) are nearly second-order asympoic opimal in he sense ha for any q 2 Q {q 0 } sup E q,n [T ASR (b) n + T ASR (b) n] n inf sup E q,n [T(b) n + T(b) n] =O(log log g). T(b)2S(g) n (8)

12 Enropy 208, 20, 08 2 of 25 A similar expression holds for T ACM (b). The resul means ha, compared wih any procedure (including he opimal procedure) calibraed o have a fixed ARL larger han g, our procedure incurs an a mos log(log g) increase in he CADD. Comparing (8) wih (6), we noe ha he ARL g plays he same role as /a because /g is roughly he false-alarm rae for sequenial change-poin deecion [8] Example: Regre Bound for Specific Cases In his subsecion, we show ha he regre bound R can be expressed as a weighed sum of Bregman divergences beween wo consecuive esimaors. This form of R is useful o show he logarihmic regre for OMD. The following resul comes as a modificaion of [6]. Theorem 3. Assume ha X, X 2,...are i.i.d. random variables wih densiy funcion f q (x). Le h i = /i in Algorihm. Assume ha { ˆq i } i, { ˆµ i } i are obained using Algorihm and ˆq i = q i (defined in sep 7 and 8 of Algorihm ) for any i. Then for any q 0 2 Q and, R = i B F ( ˆµ i, ˆµ i )= 2 where µ i = l ˆµ i +( l) ˆµ i, for some l 2 (0, ). i ( ˆµ i ˆµ i ) [r 2 F ( µ i )]( ˆµ i ˆµ i ), Nex, we use Theorem 3 on a concree example. The mulivariae normal disribuion, denoed by N (q, I d ), is paramerized by an unknown mean parameer q and a known covariance marix I d (I d is a d d ideniy marix). Following he noaions in Secion 2.3, we know ha f(x) =x, dh(x) =(/ p 2pI d ) exp ( x x/2), Q = Q s = R d for any s < 2, F(q) =(/2)q q, µ = q and F (µ) =(/2)µ µ, where denoes he deerminan of a marix, and H is a probabiliy measure under which he sample follows N (0, I d )). When he covariance marix is known o be some S 6= I d, one can whien he vecors by muliplying S /2 o obain he siuaion here. Corollary 4 (Upper bound for he expeced regre, Gaussian). Assume X, X 2,... are i.i.d. following N (q, I d ) wih some q 2 R d. Assume ha { ˆq i } i, { ˆµ i } i are obained using Algorihm wih h i = /i and G = R d. For any > 0, we have ha for some consan C > 0 ha depends on q, E q,0 [R ] apple C d log /2. The following calculaions jusify Corollary 4, which also serve as an example of how o use regre bound. Firs, he assumpion ˆq = q in Theorem 3 is saisfied for he following reasons. Consider G = R d is he full space. According o Algorihm, using he non-negaiviy of he Bregman divergence, we have ˆq = arg min u2g B F (u, q )= q. Then he regre bound can be wrien as R = 2 ( ˆµ ˆµ 0 ) ( ˆµ ˆµ 0 )+ 2 = 2 (X q 0 ) (X q 0 )+ 2 i=2 i=2 [i ( ˆµ i ˆµ i ) ( ˆµ i ˆµ i )] ( ˆµ i ˆµ i ) (f(x i ) ˆµ i ).

13 Enropy 208, 20, 08 3 of 25 Since he sep-size h i = /i, he second erm in he above equaion can be wrien as: 2 = 2 = = i=2 i=2 i=2 i=2 Combining above, we have ( ˆµ i ˆµ i ) (f(x i ) ˆµ i ) ( ˆµ i ˆµ i ) (f(x i )+ ˆµ i ) 2(i ) (f(x i) ˆµ i ) (f(x i )+ ˆµ i )+ 2(i ) kx ik 2 i=2 E q,0 [R ] apple 2 E q,0[(x q 0 ) (X q 0 )] ( ˆµ i ˆµ i ) ( ˆµ i + ˆµ i ) i=2 i=2 2 (k ˆµ i k 2 k ˆµ i k 2 ) 2(i ) k ˆµ ik k ˆµ k 2 2 k ˆµ k 2. i=2 i E q,0[kx i k 2 ]+ 2 E q,0[kx k 2 ]. Finally, since E q,0 [kx i k 2 ]=d( + q 2 ) for any i, we obain he desired resul. Thus, wih i.i.d. mulivariae normal samples, he expeced regre grows logarihmically wih he number of samples. Using he similar calculaions, we can also bound he expeced regre in he general case. As shown in he proof above for Corollary 4, he dominaing erm for R can be rewrien as i=2 2(i ) (f(x i) ˆµ i ) [r 2 F ( µ i )](f(x i )+ ˆµ i ), where µ i is a convex combinaion of ˆµ i and ˆµ i. For an arbirary disribuion, he erm (f(x i ) ˆµ i ) [r 2 F ( µ i )](f(x i )+ ˆµ i ) can be viewed as a local normal disribuion wih he changing curvaure r 2 F ( µ i ). Thus, i is possible o prove case-by-case he O(log )-syle bounds by making more assumpions abou he disribuions. Recall he noaion Q s in Secion 2.3 such ha log f q (x) is s-srongly convex over Q s. Le k k 2 denoe he `2 norm. Moreover, we assume ha he rue parameer belongs o a se G ha is a closed and convex subse of Q s such ha sup q2g krf(q)k 2 apple M for some consan M. Thus, one can show ha log f q (x) is no only s-srongly convex bu also M-srongly smooh over G. Theorem 3 in [0] shows ha for all q 2 G and n, consider ha { ˆq i } i is obained by OMD, hen apple 2 E 2 q,0 max appleiapplen kx i k M E q,0 [R n ] apple s (log n + ). Therefore, for any bounded disribuions wihin he exponenial family, we achieve a logarihmic regre. This logarihmic regre is valid for Bernoulli disribuion, Bea disribuion and some runcaed versions of classic disribuions (e.g., runcaed Gaussian disribuion, runcaed Gamma disribuion and runcaed Geomeric disribuion analyzed in [44]). 4. Numerical Examples In his secion, we presen some synheic examples o demonsrae he good performance of our mehods. We will focus on ACM and ASR for sequenial change-poin deecion. In he following, we consider he window-limied versions (see Remark ) of ACM and ASR wih window size w = 00. Recall ha when he measure P is known, sup n 0 E q,n [T n T > n] is aained a n = 0 for boh ASR and ACM procedures (a proof can be found in he proof of Theorem 2). Therefore, in he following experimens we define he expeced deecion delay (EDD) as E q,0 [T] for a sopping ime T. To compare he performance beween differen deecion procedures, we deermine he hreshold for each deecion procedure by Mone-Carlo simulaions such ha he ARL for each procedure is abou 0, 000. Below,

14 Enropy 208, 20, 08 4 of 25 we denoe k k 2, k k and k k 0 as he `2 norm, ` norm and `0 norm defined as he number of non-zero enries, respecively. The following experimens are all run on he same Macbook Air wih an Inel i7 Core CPU. 4.. Deecing Sparse Mean-Shif of Mulivariae Normal Disribuion We consider deec he sparse mean shif for mulivariae normal disribuion. Specifically, we assume ha he pre-change disribuion is N (0, I d ) and he pos-change disribuion is N (q, I d ) for some unknown q 2{q 2 R d : kqk 0 apple s}, where s is called he sparsiy of he mean shif. Sparse mean shif deecion is of paricular ineres in sensor neworks [45,46]. For his Gaussian case, he Bregman divergence is given by B F (q, q 2 )=I(q 2, q )=kq q 2 k 2 2 /2. Therefore, he projecion ono G in Algorihm is a Euclidean projecion ono a convex se, which in many cases can be implemened efficienly. As a frequenly used convex relaxaion of he `0-norm ball, we se G = {q : kqk apple s} (i is known ha imposing an ` consrain leads o sparse soluion; see, e.g., [47]). Then, he projecion ono ` ball can be compued very efficienly via a simple sof-hresholding echnique [48]. Two benchmark procedures are he CUSUM and he GLR. For he CUSUM procedure, we specify a nominal pos-change mean, which is an all-one vecor. If knowing he pos-change mean is sparse, we can also use he shrinkage esimaor presened in [49], which performs hard or sof hresholding of he esimaed pos-change mean parameer. Our procedures are T ASR (b) and T ACM (b) wih G = R d and G = {q : kqk apple 5}. In he following experimens, we run 0, 000 Mone Carlo rials o obain each simulaed EDD. In he experimens, we se d = 20. The pos-change disribuions are N (q, I d ), where 00p% enry of q is and ohers are 0, and he locaion of nonzero enries are random. Table shows he EDDs versus he proporion p. Noe ha our procedures incur lile performance loss compared wih he GLR procedure and he CUSUM procedure. Noably, T ACM (b) wih G = {q : kqk apple 5} performs almos he same as he GLR procedure and much beer han he CUSUM procedure when p is small. This shows he advanage of projecion when he rue parameer is sparse. Table. Comparison of he EDDs in deecing he sparse mean shif of mulivariae Gaussian disribuion. Below, CUSUM : CUSUM procedure wih pre-specified all-one vecor as pos-change parameer; Shrinkage : componen-wise shrinkage esimaor in [49]; GLR : GLR procedure; ASR : T ASR (b) wih G = R d ; ACM : T ACM (b) wih G = R d ; ASR-L : T ASR (b) wih G = {q : kqk apple 5}; ACM-L : T ACM (b) wih G = {q : kqk apple 5}. p is he proporion of non-zero enries in q. We run 0, 000 Mone Carlo rials o obain each value. For each value, he sandard deviaion is less han one half of he value. p = 0. p = 0.2 p = 0.3 p = 0.4 p = 0.5 p = 0.6 CUSUM Shrinkage GLR ASR ACM ASR-` ACM-` Deecing he Scale Change in Gamma Disribuion We consider an example ha deecs he scale change in Gamma disribuions. Assume ha we observe a sequence X, X 2...of samples drawn from Gamma(a, b) for some a, b > 0, wih he probabiliy densiy funcion given by f a,b (x) =exp( xb)x a b a / G(a) (o avoid confusion wih he G parameer in Algorihm we use G( ) o denoe he Gamma funcion). The parameer a is called he dispersion parameer ha scales he loss and he divergences. For simpliciy, we fix a = jus like we fix he variance in he Gaussian case. The specificaions in he Algorhm are as follows: q = b,

15 Enropy 208, 20, 08 5 of 25 Q =(,0), f(x) =x, dh(x) =, F(q) = log( q), µ = /q and F (µ) = log µ. Assume ha he pre-change disribuion is Gamma(, ) and he pos-change disribuion is Gamma(, b) for some unknown b > 0. We compare our algorihms wih CUSUM, GLR and non-anciipaing esimaor based on he mehod of momen (MOM) esimaor in [8]. For he CUSUM procedure, we specify he pos-change b o be 2. The resuls are shown in Table 2. CUSUM fails o deec he change when b = 0., which is far away from he pre-specified pos-change parameer b = 2. We can see ha performance loss of he proposed ACM mehod compared wih GLR and MOM is very small. Table 2. Comparison of he EDDs in deecing he scale change in Gamma disribuion. Below, CUSUM : CUSUM procedure wih pre-specified pos-change parameer b = 2; MOM : Mehod of Momens esimaor mehod; GLR : GLR procedure; ASR : T ASR (b) wih G =(,0); ACM : T ACM (b) wih G =(,0). We run 0, 000 Mone Carlo rials o obain each value. For each value, he sandard deviaion is less han one half of he value. b = 0. b = 0.5 b = 2 b = 5 b = 0 CUSUM NaN MOM GLR ASR ACM Communicaion-Rae Change Deecion wih Erdős-Rényi Model Nex, we consider a problem o deec he communicaion-rae change in a nework, which is a model for social nework daa. Suppose we observe communicaion beween nodes in a nework over ime, represened as a sequence of (symmeric) adjacency marices of he nework. A ime, if node i and node j communicaes, hen he adjacency marix has on he ijh and jih enries (hus i forms an undireced graph). The nodes ha do no communicae have 0 on he corresponding enries. We model such communicaion paerns using he Erdos-Renyi random graph model. Each edge has a fixed probabiliy of being presen or absen, independenly of he oher edges. Under he null hypohesis, each edge is a Bernoulli random variable ha akes values wih known probabiliy p and value 0 wih probabiliy p. Under he alernaive hypohesis, here exiss an unknown ime k, afer which a small subse of edges occur wih an unknown and differen probabiliy p 0 6= p. In he experimens, we se N = 20 and d = 90. For he pre-change parameers, we se p i = 0.2 for all i =,..., d. For he pos-change parameers, we randomly selec n ou of he 90 edges, denoed by E, and se p i = 0.8 for i 2Eand p i = 0.2 for i /2 E. As said before, le he change happen a ime n = 0 (since he upper bound for EDD is achieved a n = 0 as argued in he proof of Theorem 2). To implemen CUSUM, we specify he pos-change parameers p i = 0.8 for all i =,..., d. The resuls are shown in Table 3. Our procedures are beer han CUSUM procedure when n is small since he pos-change parameers used in CUSUM procedure is far from he rue parameer. Compared wih he GLR procedure, our mehods have a small performance loss, and he loss is almos negligible as n approaches o d = 90. Table 3. Comparison of he EDDs in deecing he changes of he communicaion-raes in a nework. Below, CUSUM : CUSUM procedure wih pre-specified pos-change parameers p = 0.8; GLR : GLR procedure; ASR : T ASR (b) wih G = R; ACM : T ACM (b) wih G = R. We run 0, 000 Mone Carlo rials o obain each value. For each value, he sandard deviaion is less han one half of he value. n = 78 n = 00 n = 20 n = 50 n = 70 n = 90 CUSUM GLR ASR ACM

16 Enropy 208, 20, 08 6 of 25 Below are he specificaions of Algorihm in his case. For Bernoulli disribuion wih unknown parameer p, he naural parameer q is equal o log(p/( p)). Thus, we have Q = R, f(x) =x, dh(x) =, F(q) =log( + exp(q)), µ = exp(q)/( + exp(q)) and F (µ) =µ log µ +( µ)log( µ) Poin Process Change-Poin Deecion: Poisson o Hawkes Processes In his example, o illusrae he siuaion in Secion.2, we consider a case where a homogeneous Poisson process swiches o a Hawkes process (see, e.g., [2]); his can be viewed as a simples case in Secion.2 wih one node. We consruc ACM and ASR procedures. In his case, he MLE for he unknown pos-change parameer canno be found in close-form, ye ACM and ASR can be easily consruced and give reasonably good performance, alhough our heory no longer holds in his case due o he lack of i.i.d. samples. The Hawkes process can be viewed as a non-homogeneous Poisson process where he inensiy is influenced by hisorical evens. The daa consis of a sequence of evens occurring a imes {, 2,..., n } before a ime horizon T: i apple T. Assume he inensiy of he Poisson process is l s, s 2 (0, T) and here may exiss a change-poin k 2 (0, T) such ha he process changes. The null and alernaive hypohesis ess are 8 >< >: H 0 : l s = µ, 0 < s < T; H : l s = µ, 0 < s < k, l s = µ + q k<j <s j(s j ), k < s < T, where µ is a known baseline inensiy, q > 0 is unknown magniude of he change, j(s) =be bs is he normalized kernel funcion wih pre-specified parameer b > 0, which capures he influence from he pas evens. We rea he pos-change influence parameer q as unknown since i represens an anomaly. We firs use a sliding window o conver he even imes ino a sequence of vecors wih overlapping evens. Assume of size of he sliding window is L. For a given scanning ime T i apple T, we map all he evens in [T i L, T i ] o a vecor X i =[ (),..., (mi )], (i) 2 [T i L, T i ], where m i is he number of evens falling ino he window. Noe ha X i can have differen lengh for differen i. Consider a se of scanning imes T, T 2,..., T. This maps he even imes ino a sequence of vecors X, X 2,..., X of lenghes m, m 2,..., m. These scanning imes can be arbirary; here we se hem o be even imes so ha here are a leas one sample per sliding window. For a hypoheical change-poin locaion k, i can be shown ha he log-likelihood raio (beween he Hawkes process and he Poisson process) as a funcion of q, is given by `(q X i )= q 2(T i L,T i ) log 2 4µ + q j 2(T i L, q ) be b( q j ) 3 5 µl q h e b(t i i q ). (9) q 2(T i L,T i ) Now based on his sliding window approach, we can approximae he original change-poin deecion problem as he following. Wihou change, X,..., X are sampled from a Poisson process. Under he alernaive, he change occurs a some ime such ha X,..., X k are sampled from a Poisson process, and X k+,..., X are sampled from a Hawkes process wih parameer q, raher han a Poisson process. We define he esimaor of q, for assumed change-poin locaion k = k as follows ˆq k,i, ˆq k,i (X k,...,x i )= ˆq k,i (` 2 [T k, T i ]) (20) Now, consider k 2 [i w, i ], and keep w esimaors: ˆq i w,i,..., ˆq i,i. The updae for each esimaor is based on sochasic gradien descen. By aking derivaive wih respec o q, we have

17 Enropy 208, 20, 08 7 of 25 `(q X i ) a = q 2(T i L,T i ) j 2(T i L, q ) be b( q j ) µ + q j 2(T i L, q ) be b( q j ) q 2(T i L,T i ) h e b(t i q ) i, Noe ha here is no close form expression for he MLE, which he soluion o he above equaion. We perform sochasic gradien descen insead ˆq k,i+ = ˆq k,i g `(q X i+) q q= ˆq k,i, k = i w +, i w,...,i, where g > 0 is he sep-size. Now we can apply he ACM and ASR procedures, by using he fac ha f ˆq k, (X + )/ f q0 (X + )=`( ˆq k, X + ) and calculaing using (9). Table 4 shows he EDD for differen a. Here we choose he hreshold such ha ARL is We see ha he scheme has a reasonably good performance, he deecion delay decreases as he rue signal srengh q increases. Table 4. Poin process change-poin deecion: EDD of ACM and ASR procedures for various values of rue q; ARL of he procedure is conrolled o be 5000 by selecing hreshold via Mone Carlo simulaion. q = 0.4 q = 0.5 q = 0.5 q = 0.7 ACM ASR Conclusions In his paper, we consider sequenial hypohesis esing and change-poin deecion wih compuaionally efficien one-sample updae schemes obained from online mirror descen. We show ha he loss of he saisical efficiency caused by he online mirror descen esimaor (replacing he exac maximum likelihood esimaor using he complee hisorical daa) is relaed o he regre incurred by he online convex opimizaion procedure. The resul can be generalized o any esimaion mehod wih logarihmic regre bound. This resul sheds lighs on he relaionship beween he saisical deecion procedures and he online convex opimizaion. Acknowledgmens: This research was suppored in par by Naional Science Foundaion (NSF) NSF CCF , CMMI , NSF CAREER CCF o Yao Xie. We would like o hank he anonymous reviewers o provide insighful commens. Auhor Conribuions: Yang Cao, Yao Xie, and Huan Xu conceived he idea and performed he heoreical par of he paper; Liyan Xie helped wih numerical examples of he manuscrip. Conflics of Ineres: The auhors declare no conflic of ineres. Appendix A. Proofs Proof of Theorem. In he proof, for he simpliciy of noaion we use N o denoe (b). Recall q is he rue parameer. Define ha S q = log f q(x i ) f q0 (X i ). Then under he measure P q,0, S is a random walk wih i.i.d. incremen. Then, by Wald s ideniy (e.g., []) we have ha E q,0 [S q N ]=E q,0[n] I(q, q 0 ). (A) On he oher hand, le qn denoe he MLE based on (X,..., X N ). The key o he proof is o decompose he sopped process S q N as a summaion of hree erms as follows:

18 Enropy 208, 20, 08 8 of 25 S q N N = log f N q(x i ) f q N (X i ) + log f q (X N N i) f ˆq (X i i ) + log f ˆq (X i i ) f q0 (X i ), (A2) Noe ha he firs erm of he decomposiion on he righ-hand side of (A2) is always non-posiive since N log f N q(x i ) f q N (X i ) = log f q (X i ) sup N q2q log f q (X i) apple 0. Therefore we have E q,0 [S q N ] apple E q,0 " N log f q (X # " N i) N + E f ˆq (X i i ) q,0 log f # ˆq (X i i ). f q0 (X i ) Now consider he hird erm in he decomposiion (A2). Similar o he proof of equaion (5.09) in [4], we claim ha is expecaion under measure P q,0 is upper bounded by b/i(q, q 0 )+O() as b!. Nex, we prove he claim. For any posiive ineger n, we furher decompose he hird erm in (A2) as n where and log f ˆq i (X i ) f q0 (X i ) = M n(q) G n (q)+m n (q, q 0 )+ni(q, q 0 ), (A3) M n (q) = m n (q, q 0 )= n G n (q) = n log f ˆq i (X i ) f q (X i ) n I(q, ˆq i ), log f q(x i ) f q0 (X i ) + G n (q), ni(q, q 0 ). The decomposiion of (A3) consiss of sochasic processes {M n (q)} and {m n (q, q 0 )}, which are boh P q,0 -maringales wih zero expecaion, i.e., E q,0 [M n (q)] = E q,0 [m n (q, q 0 )] = 0 for any posiive ineger n. Since for exponenial family, he log-pariion funcion F(q) is bounded, by he inequaliies for maringales [47] we have ha E q,0 M n (q) applec p n, Eq,0 m n (q, q 0 ) applec 2 p n, (A4) where C and C 2 are wo absolue consans ha do no depend on n. Moreover, we observe ha for all q 2 Q, E q,0 [G n (q)] apple E q,0 applemax G n ( q) = E q,0 [R n (q)] apple C log n. q2q Therefore, applying (A4), we have ha n G n (q), n M n (q) and n m n (q, q 0 ) converge o 0 almos surely. Moreover, he convergence is P q,0 -r-quickly for r =. We say ha n A n converges P q,0 -r-quickly o a consan I if E q,0 [G(e)] r < for all e > 0, where G(e) = sup{n : n A n I > e} is he las ime when n A n leaves he inerval [I e, I + e] (for more deails, we refer he readers o Secion of [4]). Therefore, dividing boh sides of (A3) by n, we obain n n log( f ˆq i (X i )/ f q0 (X i )) converges P q,0 --quickly o I(q, q 0 ). For e > 0, we now define he las enry ime L(e) =sup ( n : I(q, q 0 ) n log f ˆq i (X i ) f q0 (X i ) n > en ).

19 Enropy 208, 20, 08 9 of 25 By he definiion of P q,0 --quickly convergence and he finieness of I(q, q 0 ), we have ha E q,0 [L(e)] < + for all e > 0. In he following, define a scaled hreshold b = b/i(q, q 0 ). Observe ha condiioning on he even {L(e)+ < N < + }, we have ha ( e)(n )I(q, q 0 ) < N log f ˆq (X i i ) f q0 (X i ) < b. Therefore, condiioning on he even {L(e)+ < N < + }, we have ha N < + b/( for any 0 < e <, we have e). Hence, N apple + I({N > L(e)+}) b b + I({N apple L(e)+}) L(e) apple + + L(e). (A5) e e Since E q,0 [L(e)] < for any e > 0, from (A5) above, we have ha he hird erm in (A2) is upper bounded by b + O(). Finally, he second erm in (A2) can be wrien as N log f q N (X i) f ˆq i (X i ) = N log f ˆq i (X i ) inf N q2q log f q (X i), which is jus he regre defined in (2) for he online esimaors: R, when he loss funcion is defined o be he negaive likelihood funcion. Then, he heorem is proven by combining he above analysis for he hree erms in (A2) and (A). Proof of Corollary. Firs, we can relae he expeced regre a he sopping ime o he expeced sopping ime, using he following chain of equaliies and inequaliies E q,0 [R (b) ]=E q,0 [E q,0 [R n (b) =n]] apple E q,0 [C log (b)] apple C log E q,0 [(b)], (A6) where he firs equaliy uses ieraive expecaion, he firs inequaliy uses he assumpion of he logarihmic regre in he saemen of Corollary, and he second inequaliy is due o Jensen s inequaliy. Le a =(b + O())/I(q, q 0 ), b = C/I(q, q 0 ) and x = E q,0 [(b)]. Applying (A6), he upper bound in Equaion (4) becomes x apple a + b log(x). From his, we have x apple O(a). Taking logarihm on boh sides and using he fac ha max{a + a 2 }applea + a 2 apple 2 max{a, a 2 } for a, a 2 0, log(x) apple max{log(2a), log(2b log x)} applelog(a)+o(log b). Therefore, we have ha x apple a + b(log(a)+o(log b)). Using his argumen, we obain E q,0 [(b)] apple b I(q, q 0 ) + C log b ( + o()). I(q, q 0 ) (A7) Noe ha a similar argumen can be found in [49]. Nex we will esablish a few Lemmas useful for proving Theorem 2 for sequenial deecion procedures. Define a measure Q on (X, B ) under which he probabiliy densiy of X i condiional on F i is f ˆq i. Then for any even A 2F i, we have ha Q(A) = R A L idp. The following lemma shows ha he resricion of Q o F i is well defined. Lemma A. Le Q i be he resricion of Q o F i. Then for any A 2F k and any i k, Q i (A) =Q k (A). Proof of Lemma. To bound he erm P ((b) < ), we need ake advanage of he maringale propery of L in (2). The major echnique is he combinaion of change of measure and Wald s likelihood raio ideniy []. The proofs are a combinaion of he resuls in [23] and [8] and he reader can find a complee proof in [23]. For purpose of compleeness we copy hose proofs here.

20 Enropy 208, 20, of 25 Define he L i = dp i /dq i as he Radon-Nikodym derivaive, where P i and Q i are he resricion of P and Q o F i, respecively. Then we have ha L i =(L i ) for any i (noe ha L i is defined in (2)). Combining he Lemma A and he Wald s likelihood raio ideniy, we have ha P (A \{(b) < }) =E Q h I({(b) < }) L (b) i, 8A 2F (b), (A8) where I(E) is an indicaor funcion ha is equal o for any w 2 E and is equal o 0 oherwise. By he definiion of (b) we have ha L (b) apple exp( b). Taking A = X in (A8) we prove ha P ((b) < ) apple exp( b). Proof of Corollary 2. Using (5.80) and (5.88) in [4], which are abou asympoic performance of open-ended ess. Since our problem is a special case of he problem in [4], we can obain inf E q,0[t] = T2C(a) log a I(q, q 0 ) + log(log(/a)) ( + o()). 2I(q, q 0 ) Combing he above resul and he righ-hand side of (5), we prove he corollary. Proof of Theorem 2. From (A0), we have ha for any n, E q,n [T ASR (b) n T ASR (b) > n] apple E q,n [T ACM (b) n T ACM (b) > n]. Therefore, o prove he heorem using Theorem, i suffices o show ha sup E q,n [T ACM (b) n 0 n T ACM (b) > n] apple E q,0 [(b)]. Using an argumen similar o he remarks in [8], we have ha he supreme of deecion delay over all change locaions is achieved by he case when change occurs a he firs insance, sup E q,n [T ACM (b) n T ACM (b) > n] =E q,0 [T ACM (b)]. (A9) n 0 This is a sligh modificaion (a small change on he subscrips) of he remarks in [8] bu for he purpose of compleeness and clearness we wrie he deails in he following. Noice ha since q 0 is known, for any j, he disribuion of {max j+applekapple L k, } =j+ under P q,j condiional on F j is he same as he disribuion of {max applekapple L k, } = under P q,0. Below, we use a renewal propery of he ACM procedure. Define T (j) ACM (b) =inf{ > j : max log L k, > b}. j+applekapple Then we have ha E q,0 [T ACM (b)] = E q,j [T (j) ACM (b) max j+applekapple L k, for any > j. Therefore, T (j) ha for all j, ACM (b) j T(j) ACM (b) > j]. However, max applekapple log L k, T ACM(b) condiioning on {T ACM (b) > j}. So E q,0 [T ACM (b)] = E q,j [T (j) ACM (b) j T ACM(b) > j] E q,j [T ACM (b) j T ACM (b) > j]. Thus, o prove (A9), i suffices o show ha E q,0 [T ACM (b)] apple E q,0 [(b)]. To show his, define (b) () as he new sopping ime ha applies he one-sided sequenial hypohesis esing procedure (b) o daa {X i } i=. Then we have ha in fac T ACM(b) =min {(b) () + }, his relaionship was developed in [8]. Thus, T ACM (b) apple (b) () + = (b), and E q,0 [T ACM (b)] apple E q,0 [(b)].

21 Enropy 208, 20, 08 2 of 25 Proof of Lemma 2. This is a classic resul proved by using he maringale propery and he proof rouine can be found in many exbooks such as [4]. Firs, rewrie T ASR (b) as Nex, since log T ASR (b) =inf ( : log! ) L k, > b. k=! L k, > log max L k, = max log L k,, k= applekapple applekapple (A0) we have E [T ACM (b)] E [T ASR (b)]. So i suffices o show ha E [T ASR (b)] g, if b log g. Define R = k= L k,. Direc compuaion shows ha E [R F ]=E "L, + =E " + = + k= = + R. k= k= L k, F # L k, log f ˆq (X ) f q0 (X ) F L k, E " log f ˆq (X ) f q0 (X ) F Therefore, {R } is a (P, F )-maringale wih zero mean. Suppose ha E [T ASR (b)] < (oherwise he saemen of proposiion is rivial), hen we have ha = P (T ASR (b) ) <. (A) Equaion (A) leads o he fac ha P (T ASR (b)) = o( ) and he fac ha 0 apple R apple exp(b) condiioning on he even {T ASR (b) > }, we have ha Z lim inf R dp apple lim inf (exp(b)+)p (T ASR (b) ) =0.! {T ASR (b)>}! Therefore, we can apply he opional sopping heorem for maringales, o obain ha E [R TASR (b)] = E [T ASR (b)]. By he definiion of T ASR (b), R TASR (b) > exp(b) we have ha E [T ASR (b)] > exp(b). Therefore, if b log g, we have ha E [T ACM (b)] E [T ASR (b)] g. Proof of Corollary 3. Our Theorem and he remarks in [5] show ha he minimum wors-case deecion delay, given a fixed ARL level g, is given by # # inf sup T(b)2S(g) n E q,n [T(b) n + T(b) n] = log g I(q, q 0 ) d log log g + ( + o()). 2I(q, q 0 ) (A2) I can be shown ha he infimum is aained by choosing T(b) as a weighed Shiryayev-Robers deecion procedure, wih a careful choice of he weigh over he parameer space Q. Combing (A2) wih he righ-hand side of (5), we prove he corollary. The following derivaion borrows ideas from [6]. Firs, we derive concise forms of he wo erms in he definiion of R in (2).

22 Enropy 208, 20, of 25 Lemma A2. Assume ha X, X 2,... are i.i.d. random variables wih densiy funcion f q (x), and assume decreasing sep-size h i = /i in Algorihm. Given { ˆq i } i, { ˆµ i } i generaed by Algorihm. If ˆq i = q i for any i, hen for any null disribuion parameer q 0 2 Q and, { log f ˆq i (X i )} = ib F ( ˆµ i, ˆµ i ) F ( ˆµ ). (A3) Moreover, for any, where ˆµ =(/) f(x i). inf q2q { log f q (X i)} = F ( ˆµ), (A4) By subracing he expressions in (A3) and (A4), we obain he following resul which shows ha he regre can be represened by a weighed sum of he Bregman divergences beween wo consecuive esimaors. Proof of Lemma A2. By he definiion of he Legendre-Fenchel dual funcion we have ha F (µ) = q µ F(q) for any q 2 Q. By his definiion, and choosing h i = /i, we have ha for any i log f ˆq (X i i )=F( ˆq i ) ˆq i f(x i)= ˆq i ( ˆµ f(x i )) F ( ˆµ i )= ˆq h i ( ˆµ i ˆµ i ) F ( ˆµ i ) i = (F ( ˆµ h i ) F ( ˆµ i )) ˆq i ( ˆµ i ˆµ i ) F ( ˆµ i h i )+ F ( ˆµ i h i ) i = h i B F ( ˆµ i, ˆµ i )+ h i F ( ˆµ i ) h i F ( ˆµ i ), (A5) where we use he updae rule in Line 6 of Algorihm and he assumpion ˆq i = q i o have he hird equaion. We define /h 0 = 0 in he las equaion. Now summing he erms in (A5), where he second erm form a elescopic series, over i from o, we have ha { log f ˆq i (X i )} = = h i B F ( ˆµ i, ˆµ i )+ h 0 F ( ˆµ 0 ) h i B F ( ˆµ i, ˆµ i ) F ( ˆµ ). h F ( ˆµ ) Moreover, from he definiion we have ha Taking he firs derivaive of { he saionary poin, given by { log f q (X i )} = [F(q) q f(x i )]. log f q(x i )} wih respec o q and seing i o 0, we find ˆµ, ˆµ = rf(q) = f(x i ). Similarly, using he expression of he dual funcion, and plugging ˆµ back ino he equaion, we have ha inf q2q { log f q (X i)} = q ˆµ F ( ˆµ) q f(x i )= F ( ˆµ).

23 Enropy 208, 20, of 25 Proof of Theorem 3. By choosing he sep-size h i = /i for any i ˆq i = q i for any i, we have by inducion ha in Algorihm, and assuming ˆµ = f(x i )= ˆµ. Subracing (A3) by (A4), we obain R = = = = = 2 { log f ˆq i (X i )} inf q2q { log f q (X i)} ib F ( ˆµ i, ˆµ i ) F ( ˆµ )+F ( ˆµ) ib F ( ˆµ i, ˆµ i ) The final equaliy is obained by Taylor expansion. References i[f ( ˆµ i ) F ( ˆµ i ) hrf ( ˆµ i ), ˆµ i ˆµ i i] i ( ˆµ i ˆµ i ) [r 2 F ( µ i )]( ˆµ i ˆµ i ).. Siegmund, D. Sequenial Analysis: Tess and Confidence Inervals; Springer: Berlin, Germany, Chen, J.; Gupa, A.K. Parameric Saisical Change Poin Analysis; Birkhauser: Basel, Swizerland, Siegmund, D. Change-poins: From sequenial deecion o biologu and back. Seq. Anal. 203, 23, Tarakovsky, A.; Nikiforov, I.; Basseville, M. Sequenial Analysis: Hypohesis Tesing and Changepoin Deecion; CRC Press: Boca Raon, FL, USA, Granjon, P. The CuSum Algorihm A Small Review Available online: hps://hal.archives-ouveres. fr/hal (accessed on 6 February 208). 6. Basseville, M.; Nikiforov, I.V. Deecion of Abrup Changes: Theory and Applicaion; Prenice Hall Englewood Cliffs: Upper Saddle River, NJ, USA, 993; Volume Lai, T.Z. Informaion bounds and quick deecion of parameer changes in sochasic sysems. IEEE Trans. Inf. Theory 998, 44, Lorden, G.; Pollak, M. Nonanicipaing esimaion applied o sequenial analysis and changepoin deecion. Ann. Sa. 2005, 33, Raginsky, M.; Marcia, R.F.; Silva, J.; Wille, R. Sequenial probabiliy assignmen via online convex programming using exponenial families. In Proceedings of he IEEE Inernaional Symposium on Informaion Theory, Seoul, Korea, 28 June 3 July 2009; IEEE: Piscaaway, NJ, USA, 2009; pp Raginsky, M.; Wille, R.; Horn, C.; Silva, J.; Marcia, R. Sequenial anomaly deecion in he presence of noise and limied feedback. IEEE Trans. Inf. Theory 202, 58, Peel, L.; Clause, A. Deecing change poins in he large-scale srucure of evolving neworks. In Proceedings of he 29h AAAI Conference on Arificial Inelligence (AAAI), Ausin, TX, USA, January Li, S.; Xie, Y.; Farajabar, M.; Verma, A.; Song, L. Deecing weak changes in dynamic evens over neworks. IEEE Trans. Signal Inf. Process. Over New. 207, 3, Cesa-Bianchi, N.; Lugosi, G. Predicion, Learning, and Games; Cambridge Universiy Press: Cambridge, UK, Hazan, E. Inroducion o online convex opimizaion. Found. Trends Opim. 206, 2, Siegmund, D.; Yakir, B. Minimax opimaliy of he Shiryayev-Robers change-poin deecion rule. J. Sa. Plan. Inference 2008, 38, Azoury, K.; Warmuh, M. Relaive loss bounds for on-line densiy esimaion wih he exponenial family of disribuions. Mach. Learn. 200, 43,

24 Enropy 208, 20, of Page, E. Coninuous inspecion schemes. Biomerika 954, 4, Lorden, G. Procedures for reacing o a change in disribuion. Ann. Mah. Sa. 97, 42, Mousakides, G.V. Opimal sopping imes for deecing changes in disribuions. Ann. Sa. 986, 4, Shiryaev, A.N. On opimum mehods in quickes deecion problems. Theory Probab. Appl. 963, 8, Willsky, A.; Jones, H. A generalized likelihood raio approach o he deecion and esimaion of jumps in linear sysems. IEEE Trans. Auom. Conrol 976, 2, Lai, T.L. Sequenial changepoin deecion in qualiy conrol and dynamical sysems. J. R. Sa. Soc. Ser. B 995, 57, Lai, T.Z. Likelihood raio ideniies and heir applicaions o sequenial analysis. Seq. Anal. 2004, 23, Lorden, G.; Pollak, M. Sequenial change-poin deecion procedures ha are nearly opimal and compuaionally simple. Seq. Anal. 2008, 27, Pollak, M. Average run lenghs of an opimal mehod of deecing a change in disribuion. Ann. Sa. 987, 5, Mei, Y. Sequenial change-poin deecion when unknown parameer are presen in he pre-change disribuion. Ann. Sa. 2006, 34, Yilmaz, Y.; Mousakides, G.V.; Wang, X. Sequenial join deecion and esimaion. Theory Probab. Appl. 205, 59, Yılmaz, Y.; Li, S.; Wang, X. Sequenial join deecion and esimaion: Opimum ess and applicaions. IEEE Trans. Signal Process. 206, 64, Yilmaz, Y.; Guo, Z.; Wang, X. Sequenial join specrum sensing and channel esimaion for dynamic specrum access. IEEE J. Sel. Areas Commun. 204, 32, Vo, B.N.; Vo, B.T.; Pham, N.T.; Suer, D. Join deecion and esimaion of muliple objecs from image observaions. IEEE Trans. Signal Process. 200, 58, Tajer, A.; Jajamovich, G.H.; Wang, X.; Mousakides, G.V. Opimal join arge deecion and parameer esimaion by MIMO radar. IEEE J. Sel. Top. Signal Process. 200, 4, Baygun, B.; Hero, A.O. Opimal simulaneous deecion and esimaion under a false alarm consrain. IEEE Trans. Inf. Theory 995, 4, Mousakides, G.V.; Jajamovich, G.H.; Tajer, A.; Wang, X. Join deecion and esimaion: Opimum ess and applicaions. IEEE Trans. Inf. Theory 202, 58, Kolowski, W.; Grünwald, P. Maximum likelihood vs. sequenial normalized maximum likelihood in on-line densiy esimaion. In Proceedings of he Conference on Learning Theory (COLT), Budapes, Hungary, 9 July 20; pp Anava, O.; Hazan, E.; Mannor, S.; Shamir, O. Online learning for ime series predicion. In Proceedings of he Conference on Learning Theory (COLT), Princeon, NJ, USA, 2 4 June 203; pp Wald, A.; Wolfowiz, J. Opimum characer of he sequenial probabiliy raio es. Ann. Mah. Sa. 948, 9, Barron, A.; Sheu, C.H. Approximaion of densiy funcions by sequences of exponenial families. Ann. Sa. 99, 9, Wainwrigh, M.J.; Jordan, M.I. Graphical models, exponenial families, and variaional inference. Found. Trends Mach. Learn. 2008,, Beck, A.; Teboulle, M. Mirror descen and nonlinear projeced subgradien mehods for convex opimizaion. Oper. Res. Le. 2003, 3, Nemirovskii, A.; Yudin, D.; Dawson, E. Problem Complexiy and Mehod Efficiency in Opimizaion; Wiley: Hoboken, NJ, USA, Shalev-Shwarz, S. Online learning and online convex opimizaion. Found. Trends R Mach. Learn. 202, 4, The Implemenaion of he Code. Available online: hp://www2.isye.gaech.edu/~yxie77/one-sampleupdae-code.zip (accessed on 6 February 208). 43. Agarwal, A.; Duchi, J.C. Sochasic Opimizaion wih Non-i.i.d. Noise. 20. Available online: hp://op.kyb.uebingen.mpg.de/papers/op20_agarwal.pdf (accessed on 6 February 208). 44. Alqanoo, I.M. On he Truncaed Disribuions wihin he Exponenial Family; Deparmen of Applied Saisics, Al-Azhar Universiy Gaza: Gaza, Gaza Srip, 204.

Enropy 208, 20, 08 25 of 25 45. Xie, Y.; Siegmund, D. Sequenial muli-sensor change-poin deecion. Ann. Sa. 203, 4, 670 692. 46. Siegmund, D.; Yakir, B.; Zhang, N.R.

25 Enropy 208, 20, of Xie, Y.; Siegmund, D. Sequenial muli-sensor change-poin deecion. Ann. Sa. 203, 4, Siegmund, D.; Yakir, B.; Zhang, N.R. Deecing simulaneous varian inervals in aligned sequences. Ann. Appl. Sa. 20, 5, Lipser, R.; Shiryayev, A. Theory of Maringales; Springer: Dordrech, The Neherlands, Duchi, J.; Shalev-Shwarz, S.; Singer, Y.; Chandra, T. Efficien projecions ono he `-ball for learning in high dimensions. In Proceedings of he Inernaional Conference on Machine learning (ICML), Helsinki, Finland, 5 9 June 2008; ACM: New York, NY, USA, 2008; pp Wang, Y.; Mei, Y. Large-scale muli-sream quickes change deecion via shrinkage pos-change esimaion. IEEE Trans. Inf. Theory 205, 6, c 208 by he auhors. Licensee MDPI, Basel, Swizerland. This aricle is an open access aricle disribued under he erms and condiions of he Creaive Commons Aribuion (CC BY) license (hp://creaivecommons.org/licenses/by/4.0/).

Vehicle Arrival Models : Headway

Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where