SMOOTH ESTIMATION OF ROC CURVE IN THE PRESENCE OF AUXILIARY INFORMATION

Size: px
Start display at page:

Download "SMOOTH ESTIMATION OF ROC CURVE IN THE PRESENCE OF AUXILIARY INFORMATION"

Transcription

1 J Syst Sci Coplex 4: SMOOTH ESTIMATION OF ROC CURVE IN THE PRESENCE OF AUXILIARY INFORMATION Yong ZHOU Haibo ZHOU Yunbei MA DOI:.7/s Received: May 7 / Revised: August 8 c The Editorial Office of JSSC & Springer-Verlag Berlin Heidelberg Abstract Receiver operating characteristic ROC curve is often used to study and copare twosaple probles in edicine. When ore inforation ay be available on one treatent than the other, one can iprove estiator of ROC curve if the auxiliary population inforation is taken into account. The authors show that the epirical likelihood ethod can be naturally adapted to ake efficient use of the auxiliary inforation to such probles. The authors propose a soothed epirical likelihood estiator for ROC curve with soe auxiliary inforation in edical studies. The proposed estiates are ore efficient than those ROC estiators without any auxiliary inforation, in the sense of coparing asyptotic variances and ean squared error MSE. Soe asyptotic properties for the epirical likelihood estiation of ROC curve are established. A siulation study is presented to deonstrate the perforance of the proposed estiators. Key words Auxiliary inforation, epirical likelihood, ROC curve, sooth estiation. Introduction Applications of receiver operating characteristic ROC curve analysis can be found in alost every scientific field. Swet and Pickett [] listed about references in a variety of subjects areas where ROC curve ethods had been used, ranging fro applications on signal detection, psychology, polygraphic detection, epideiology, nutrition, radiology and general edical decision aking. Many scientific studies are designed to deterine if a new treatent is better than either no the treatent or the best currently available treatent i.e., standard treatent. The popularity of the ROC curve technique is originated fro its ability to discriinate between a standard treatent and a new treatent, hence, allowing the coparison of copeting treatents. Yong ZHOU Acadey of Matheatics and Systes Science, Chinese Acadey of Sciences, Beijing 9, China;School of Statistics and Manageent, Shanghai University of Finance and Econoics, Shanghai 433, China. Eail: yzhou@ass.ac.cn. Haibo ZHOU Departent of Biostatistics, University of North Carolina Chapel Hill, NC , USA. Yunbei MA Acadey of Matheatics and Systes Science, Chinese Acadey of Sciences, Beijing 9, China. This research was partially supported by National Natural Science Funds for Distinguished Young Scholar under Grant No and National Natural Science Foundation of China NSFC under Grant No. 73, the National Basic Research Progra under Grant No. 7CB849, Creative Research Groups of China under Grant No.7 and Shanghai University of Finance and Econoics through Project Phase III and Shanghai Leading Acadeic Discipline Project under Grant No. B83. This paper was recoended for publication by Editor Guohua ZOU.

2 9 YONG ZHOU HAIBO ZHOU YUNBEI MA Let X and Y be the corresponding easured responses fro the standard and the new treatents, and follow the distribution F and G, respectively. Then the ROC ethod is based on coparing the estiation of F and G. Various assuptions for the distribution functions have been considered. Paraetric estiation of ROC curve has been alost universally based on the assuptions that the distributions of the response variables for two groups are related by a location and a scale shift, see [ 4]. Lloyd [5] proposed a fully nonparaetric estiator of ROC curve based on kernel techniques. Lloyd and Zhou [6] further shown that the soothed estiators of ROC curve are better than the epirical one in the sense of ean squared error MSE. Peng and Zhou [7] proposed a local linear regression for the ROC curve, which can directly address the optiality in the ROC curve estiation. Ren, Zhou, and Liang [8] proposed a flexible ethod for estiating the ROC curve which is based on a continuous-scale test. Under the nonparaetric situation, the epirical estiator of ROC curve can be easily ipleented and the area under the estiated curve AUC is the Mann-Whitney rank statistic for testing the difference between two populations. Qin and Zhou [9] proposed an epirical likelihood approach for the inference of AUC. Chabless and Diao [] also proposed an estiation of tie-dependent AUC for long ter risk prediction. For binary regression fraework, Pepe [] proposed a seiparaetric estiator of the area under ROC curve based on the generalized linear odel GLAM. Alonzo and Pepe [] considered a odel where the ROC curve is a paraetric function of covariates, and proposed to estiate the area under ROC curve based on binary indicators. Other odels and estiation ethods included Hsieh and Turnbull [4], Cai and Pepe [3],HanleyandMcNei [4],Pepe [5 7],Zhou [8], Heagerty and Zheng [9], Zheng, Cai, and Feng [],Albert [],Horváth, Horváth, and Zhou []. Under nonparaetric assuptions, Lloyd [5] pointed out that the estiators of ROC curve based on epirical distribution are poor graphical suaries because of their discontinues step. Alternatively, one ay use paraetrical odels for the distributions of rando variables fro the standard treatent and new treatent. Soe reserchers e.g., Capbell and Ratnaparkhi [],Hsieh [3] fitted a paraetric distribution to both variables fro the standard and new treatent by location-scale shift odel. Generally, even if the distributions are truly related via a location-scale shift, the estiated ROC curve is still biased see [5]. In evaluating the efficacy of a new treatent with respect to a standard treatent, the associated two saples of data frequently have soe additional inforation for the standard treatent. Li, Tiwari, and Wells [4] studied a two-saple proble based on the assuption that there is no auxiliary inforation of the new treatent, but additional inforation is available for the standard treatent. Hence, they proposed a seiparaetric odel and assued that F is known up to soe unknown paraeters and that G is copletely unknown. It is a quite leap in ters of odel assuption fro the nonparaetric odel to the seiparaetric odel assuptions. There exist soe situations that one ay not have the paraetric odel for the standard treatent. For exaple, it is possible that the distribution function fro standard treatent is unknown, but soe characters of this distribution are available. In other words, one ay not have enough knowledge to fit the distribution of X paraetrically, but ay know, to a lesser extent, soe oents of X. In cancer studies, the extra inforation often exists as functions of oents. For instant, the ean survival tie of the patients with liver cancer or lung cancer is about half year under standard treatent, however, one ay not have such inforation for patients under a new treatent. More accurately, the ean of the distribution of the standard treatent data ay be known or the variance ay be a known function of the ean, as occurs with estiating equations. Hence, the efficiency of ROC curve estiation can be iproved when such auxiliary inforation are taken into account. We propose a new class of ROC curve estiators in nonparaetric setting by taking the

3 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 9 auxiliary inforation into account. The new ROC curve estiator can be constructed by the epirical likelihood procedure. We incorporate the soothing nonparaetric techniques [6] into estiation. We also show that when the auxiliary inforation is present, the proposed ROC curve estiators are ore efficient than those in the traditional nonparaetric settings. The rest of the paper is organized as follows. In Section, we outline and propose an estiator using the epirical likelihood techniques and derive its asyptotic variance. We show that the proposed estiator converges to noral distribution. In Section 3, we evaluate the asyptotic relative efficiency of the proposed estiator to existing ones in the literature under several exaples. In Section 4, we conduct a siulation study to copare the proposed estiator with soe classic estiators. Proofs of theores in Section are given in the Appendix. Epirical Likelihood Estiation of ROC Curve. Estiation Method Let {X i },,, be sequence of i.i.d. responses fro standard treatent group with distribution F, and be independent of {Y j } j=,,,n which is a sequence of i.i.d responses fro the new treatent group with distribution G. We assue that there are auxiliary inforations for the standard treatent in the following for E F φ l X =,l=,,,r, where φx =φ x,φ x, φ r x T is a vector of known real functions. The ROC curve see [5, 4] is defined by Rt = GF t, for t, where F t =inf{x : F x t} is a quantile function of F at t. We assue that F and G are unknown except. And we can use the nonparaetric MLEs of F and G which can be obtained by epirical likelihood ethods see [5 6] to estiate ROC curve Rt. The epirical likelihood ethod has sapling properties siilar to the bootstrap, but instead of resapling it works by profiling a ultinoial likelihood supported on the saple. Obviously, the nonparaetric MLE of G is its epirical distribution function without any constraint, i.e., G n y =n n IY i y, where IA is a indicator of the set A. To obtain a nonparaetric MLE of F with the auxiliary inforation, we introduce the epirical likelihood ethod. Let p =p,p,,p T denote a ultinoial distribution on the points X,X,,X, and put Lp = n p i. In the presence of additional distribution inforation as expressed in, we axiize Lp subject to the following constraints: p i, p i = and p i φ l X i =, l =,,,r. When is inside of the convex hull of the points φx,φx,,φx, ax Lp exits p uniquely see [7]. By Lagrange ultiplier ethod, it can be shown as ax Lp = p p i,

4 9 YONG ZHOU HAIBO ZHOU YUNBEI MA where p i = +λ T, i =,,,, φx i with λ =λ,λ,,λ r T, by the language ultiplier, being the solution of φx i +λ T =. 3 φx i To construct a consistent estiator of ROC curve, it is necessary to construct consistent estiators for both unknown distribution functions F and G. We can derive an estiator of distribution F fro and it is a axiu epirical likelihood estiator MELE. Let F x = p i IX i x, 4 then F x can be regarded as the MELE of the distribution function with the auxiliary inforation. If there is no auxiliary inforation, the profile epirical likelihood Lp attains its axiu at p i =/n and consequently, F x reduces to F x =n IX i x, which is the usual epirical distribution function. To iprove the efficiency of MELE, we introduce kernel sooth techniques to estiate ROC curve. A sooth epirical likelihood estiator of F is defined as F x = and the sooth estiator of G is x t K df h t = G n y = n n x +λ T φx i K Xi, 5 h y Yi K, 6 h where Kx = x ktdt, kx is a suitable probability density function and h i i =,, are two sequences of bandwidths. A natural estiator of ROC curve based on sooth epirical likelihood estiators F and G n can be written as Rp = G n F p. For siplicity, we call Rp the sooth epirical likelihood estiator SELE. To copare the proposed estiator of ROC curve with the others existing in literature, we introduce the estiator proposed by Lloyd [5], Lloyd and Zhou [6] and the classic nonparaetric epirical estiator. The soothed estiator of ROC curve denoted by LZSE in the absence of auxiliary inforation is defined by Rp = Ĝn F p, where F x = Kx X i/h is the sooth version of the usual epirical distribution function F x andĝn = G n. The classic nonparaetric epirical estiator CNEE of ROC curve without any soothness and auxiliary inforation is R np = G n F p,

5 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 93 where G n y =n n IY i x is the epirical distribution function of Y.. Asyptotic Results To derive asyptotic results we assue that the saple sizes = n andn/ ρ> as n. We also assue that F and G have continuous densities f and g at a given point θ p = F p forsoe<p<, respectively. We will prove in the Appendix that the ean and variance of the soothed MELE F x are approxiately E F x =E F x+o, 7 Var F x = Var F x A T xσ Ax+o, 8 where Σ = E[φXφ T X] and Ax =E F [φxix x]. It is noteworthy that these results can not be derived by Qin and Lawless [7] directly because this estiator has been soothed. Obviously, the variance of F x is saller than that of the usual epirical distribution function F x = IX i x. For siplicity of expression, we assue that k is the derivative of distribution K and it is a syetric kernel such that x kxdx =. In general, we assue that h i andnh i, i =,. In fact, by Lea 3 in the Appendix, we can show that the optial bandwidth of F and G n have the sae order satisfying h i = On /3 fori =,. We can use the sae ethods to choose the optial bandwidth as those in [6, 8]. We can further approxiate the bias and variance of the sooth epirical likelihood estiator F x by soe tedious proofs and Taylor s expansion, it follows fro 7 and 8 that E F x =F x+ h F xμ + o +Oh ; Var F x = [F x F x A T xσax h α F x + ] h F x[ F x]α + Oh 3, where μ i = x i kxdx and α i = x i kxkxdx, i =,,. The first ter of asyptotic variance of F x is the sae as that of the usual epirical estiator F x off. The second ter is attributed to the fact that the epirical likelihood techniques with the auxiliary inforation is eployed. The third ter is due to the kernel sooth procedure when estiating F, but this quantity decreases as n. The last ter in the asyptotic variance are negligible. Obviously, with a suitably selected kernel function kx, the asyptotic variance of F can always be saller than those of F x andfx ifσ is positive definite. Hence, F is asyptotically ore efficient than F x andfx. Under the auxiliary inforation assuption, the asyptotic variance and bias of the proposed estiator SELE of the ROC curve are suarized in the following theore. Theore Assue that condition 6 in Appendix are satisfied, then Var Rp Rp Rp = + p pr p n g θ p Σ f θ p + o, 9

6 94 YONG ZHOU HAIBO ZHOU YUNBEI MA bias Rp = R p { h f p p θ p + + h h g θ p +o Σ } p, where Ax =E F [φxix x], Σ = A T θ p Σ Aθ p,σ= E F [φxφ T X] and θ p = F p. Reark Without any auxiliary inforation and the sooth technique for the estiator of ROC curve, the asyptotic variance and bias of CNEE are derived by Lloyd and Zhou [6], VarRn Rp Rp p = + p pr p + On 3, n biasrnp = Rp+On 3 4, respectively. In the absence of the auxiliary inforation, the sooth estiator LZSE of the ROC curve has the following asyptotic variance and bias, which are due to Lloyd [5],Lloyd and Zhou [6] : Var Rp = Rpα Rp + pα pr p + O { n } bias Rp = R p h f p p θ p + h, + h h g θ p +o + h, where θ p = F p andα i = x i kxkxdx, i =,,. Here, α =/. It is noteworthy that Σ is always positive if atrix Σ is positive definite, hence, the third ter of 9 is negative. The asyptotic variance of SELE of ROC curve is saller than those of both LZSE Rp andcneern p fro Reark. Then, the asyptotic variance of the proposed SELE of ROC curve is coparable to those of estiators Rp andrn p. As a result, the proposed estiator is asyptotically ore efficient than Rp andrn θ whenσ is positive definite. It follows that for h = h, the bias of the proposed SELE of ROC curve decreases in this case. If Rp is convex, then the absolute of bias of the estiator of ROC curve also decreases because the third ter o of is saller than the first ter of the parentheses of, under the assuption that bandwidth h satisfies nh 3 i a constant. The optial bandwidth is h i = O P n /3 fori =,, hence, the third ter in the parentheses is doinated by the first ter since h // =h. Because of these arguents, the relationship between optial bandwidths of h and h also can be neglected fro Theore. Suppose that the slope of the curve Rp, i.e., R p =gθ p /fθ p, is bounded away fro zero and infinity in a neighborhood of a point p, <p<, we can derive the strong consistency and asyptotic norality for the proposed estiator of ROC curve. Theore Assue that the conditions 5 in Appendix are satisfied, then as n, we have Rp Rp in probability or alost surely. 3 Furtherore, if g x is a continuous function at point θ p and E φx 4 <, then Rp Rp = O n log log n alost surely.

7 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 95 Fro 9 and, it is straightforward to prove the first result of Theore for the case of in probability. Theore 3 Assue that the conditions 6 in Appendix are satisfied. Then Rp converges to a noral distribution with ean Rp and variance σ p, that is, where n Rp Rp L N,σ p, 4 σ p =Rp Rp + ρ{p p Σ } g θ p f θ p 5 with Σ = A T θ p Σ Aθ p. Reark In the absence of auxiliary inforation, the LZSE Rp of the ROC curve also converge to a noral distribution with a siilar expression 4 in which the asyptotic variance is σ p =Rp Rp + g θ p p pρ f. 6 θ p Coparing 6 with 5 and Reark, we can see that the sooth epirical likelihood estiator SELE of ROC curve in the presence of auxiliary inforation has saller asyptotic variance than those of the LZSE Rp andcneer n p without the auxiliary inforation[5 6].In fact, we have a ore general result by coparing 5 with 6. Let σ r be the variance of σ in 5 under auxiliary inforation with r functions in φ, i.e., φx =φ x,φ x,,φ r x T. We can easily prove the following corollary by a siilar proof of Qin and Lawless [7]. Corollary Under the conditions of Theore, forr, we have σ r σ r, where σ for r =is defined in 6. Reark 3 In the case that X and Y both have auxiliary inforation: E G ϕ k Y =, k =,,,s, where ϕy =ϕ Y,ϕ Y,,ϕ s Y T is a vector of known real functions. Siilarly to the MELE of F, we can obtain n Ĝ n y = q i IY i y as the MELE of Gy, where q i = n +γ T, i =,,,n ϕy i with γ =γ,γ,,γ s T and satisfied n ϕy i n +γ T ϕy i =.

8 96 YONG ZHOU HAIBO ZHOU YUNBEI MA Then the sooth epirical likelihood estiator of G is defined as G ny = n n y +γ T ϕy i K Yi. h Then a natural estiator of the ROC curve based on sooth epirical likelihood estiator F and G n canbeexpressedas R p = G n F p. Siilarly to the arguent above, we also can show that R p is ore efficient than LZSE, CNEE of ROC curve and even Rp..3 Estiation of Asyptotic Variance In order to estiate asyptotic variance σ p, we need to estiate ROC curve Rp, density functions f andg, and atrix Σ. The estiator of Rp can be obtained fro SELE Rp, and the density functions can be estiated by kernel ethods, that is, fx = x h +λ T φx i k Xi, ĝx = n x Yi k, h nh h where kernel function kx in estiators fx andĝx ay be different, and λ is the root of Equation 3. Note that Σ = Aθ p Σ Aθ p, hence we need to estiate Ax andσ. We can easily obtain their consistent estiators using the epirical likelihood technique, that is, Âx = φx i IX i x +λ T, Σ = φx i φx i φ T X i +λ T φx i and θ = F p, where λ can be solved fro Equation 3 by Newton-Raphson ethod. As a result, we can obtain a consistent estiator of asyptotic variance where Σ = ÂT θ p Σ Â θ p. σ p = Rp Rp + {p p Σ } nĝ θ p f θ p, 3 Asyptotic Relative Efficiency We consider asyptotic relative efficiency in the sense of coparing asyptotic variance in the noral approxiation, i.e., Theore 3. It is easy to see that the asyptotic relative efficiency of the estiator Rp with the presence of the auxiliary inforation relative to the LZSE Rp proposedbylloyd [5] and Lloyd and Zhou [6] with the absence of the auxiliary inforation is AREp = ρ θ p φt xdf xσ θ p φxdf xr p Rp Rp + R p. p pρ Although both estiators are biased, the absolution of Rp Rp is uniforly less than that of Rp Rp in large saples. Hence, we ay define the asyptotic relative efficiency by

9 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 97 asyptotic variance ratio of the estiator Rp against the estiator Rp. Later on, in the finite saples, we will copare the MSE of those estiators Rp, Rp andrnp asarelative efficiency by siulation. Note that Rp andrnp have the sae asyptotic variance, hence, the asyptotic relative efficiencies for proposed SELE Rp against both estiators Rp and Rp are of the sae. To illustrate the asyptotic relative efficiency, we introduce several siple exaples. Suppose that X,X,,X are i.i.d. rando variables with unknown distribution function F and density function f, andtheeanoff is known to be a fixed constant, say EX = μ, Assue that Y,Y,,Y n are i.i.d. rando variables with unknown distribution function G and density function g. Under this setting, we have r =andφx =φ X =X μ in. Fro Theore, the asyptotic variance of n Rp is σ p =Rp Rp + ρr p p p ρr p θ p σ x μ df x where σf is the variance of the distribution function F, i.e., σ F = EX μ. If the auxiliary inforation that EX = μ is ignored, the usual sooth estiator Rp without auxiliary inforation has an asyptotic variance fro 6, i.e., the asyptotic variance of n Rp is σ p =Rp Rp + R p p pρ. a Assue that F = G, we obtain that the asyptotic relative efficiency is ARE p = σ p θp ρ σ p = x μ df x ρ +σf p p. If X is a variable with noral distribution Nμ,σF, then we can show that the asyptotic relative efficiency is ARE p = ρ exp{ Φ p }, πρ +p p where Φ is the standard noral distribution function. We can easily draw a plot of ARE, see Figure. Figure a illustrates the asyptotic relative efficiency ARE p. We show that the ARE curve significantly depends on the ratio of two saple sizes and the ARE p curve is syetric. ARE p is iniized at p =.5 and is getting larger and lager when p is away fro.5. The properties are expected exactly because we know that the ean and edian of the noral distribution are of the sae. So if we know that ean μ =, we get the auxiliary inforation to estiate the edian of unknown distribution F which is needed in estiation of ROC curve. We expect to estiate the ROC curve Rp by Rp ore accurately and efficiently for p near.5 than for p away fro.5. ARE iplies that the SELE Rp ofroccurveis always ore efficient than other estiators Rp andrnp because <ARE p < for <p<. In this case, we can see that the asyptotic relative efficiency does not depend on the agnitude of eans and variances of F and G. b In the case that eans or variances of noral distributions are different, which shape of the asyptotic relative efficiency curve do we expect? Consider the case that F is Nμ,σ and G is Nμ,σ, then we have ρaσ /σ ARE p = πφδ p ΦΔ p + πρaσ /σ expξp p p, F,

10 98 YONG ZHOU HAIBO ZHOU YUNBEI MA where Δ p =σ ξ p μ μ /σ, A =exp Δ p, and ξ p = Φ p. When μ = μ,σ = σ, we have the sae properties as case a because ARE reduces to ARE..9 a μ,σ,μ,σ =,,, ρ=.5 ρ= ρ=.95.9 a μ,σ,μ,σ =,,, ρ=.5 ρ= ρ= a μ,σ,μ,σ =,,, ρ=.5 ρ= ρ=.95.9 a μ,σ,μ,σ =,,, ρ=.5 ρ= ρ= Figure Asyptotic relative efficiency of estiators of ROC curve: The population distributions F are noral distribution Nμ,σ andg noral distribution Nμ,σ in a d; paraeters μ,σ, μ and σ are given in the title a d respectively. Solid curves are the asyptotic relative efficiency with ρ =, dash-dotted curves are with ρ =, dashed curves are with ρ =/ Figures b d illustrate the asyptotic relative efficiency ARE. Obviously, the ARE p curve also significantly depends on the ratio of two saple sizes. When the eans of F and G are the sae even if F and G are different, the ARE curve are syetric, otherwise the ARE p curves are skewed. It can be seen that when ARE p curve is syetric, ARE p attains its iniu at p =.5 and increases as p oves away fro.5. The properties are also of expected exactly because the ean and edian of the noral distribution are of the sae. So it is expected to estiate the ROC curve Rp by Rp ore accurately and efficiently for p near.5 than for p away fro.5 as finding in a. Siilarly, as one would expect, ARE p curve iplies that the SELE Rp of ROC curve is ore efficient than two others Rp and R n p. When μ μ,theare p is skewed and attains its iniu at around p =.7forρ =. Although the ean and edian of the noral distribution F are of the sae, the ean of G is no longer the sae as F. As a result, the distribution G is syetric at another point μ rather than μ. Therefore, the ARE p is skewed and its iniu is not at point.5 again. c If X and Y have the sae exponential distribution F x = Gx = exp{ x}, i.e., X Exp and Y Exp, then the asyptotic relative efficiency for the SELE Rp vsthe LZSE Rp orcneer n p is

11 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 99 ARE 3 p = In general, if F Expμ andg Expμ, we have ARE 4 p = ρp log p ρ + p. ρp μ/μ log pμ /μ p μ/μ p μ/μ +ρμ /μ p μ/μ p p. Siilarly, we see that Figures a b illustrate soe different properties of the estiator Rp. The asyptotic relative efficiency curve ARE 4 p is always skewed. The bigger the difference between μ and μ, the ore skewness of the curve ARE 4 p. Obviously, ARE 4 p also depends on the ratio of saple sizes fro F and G. We also found that the ratio of variances of F and G has an apparent effect on the asyptotic relative efficiency, where the bigger the ratio ρ, the bigger the ARE 4 p. Hence, a larger saple fro G than fro F could lead to iproved efficiency of the proposed estiator of ROC curve. The sae conclusions hold in cases a and b, as well..9 a μ =, μ = ρ=.5 ρ= ρ=.9 b μ =,μ =3 ρ=.5 ρ= ρ= c μ =,σ = ρ=.5 ρ= ρ=.9 d μ =,σ = ρ=.5 ρ= ρ= Figure Asyptotic relative efficiency of estiators of ROC curve: The population distributions F and G are exponential distributions with eans μ and μ, respectively, in a b; F are noral distribution Nμ,σ andg the biexponential distribution in c d, respectively. Paraeters μ,σ and μ are given in the title of a d. Solid curves are the asyptotic relative efficiency with ρ =, dash-dotted curves are with ρ =, dashed curves are with ρ =/ d Now, we consider the cobining distributions. If the distribution function F is the noral distribution Nμ,σ andg is the bi-exponential distribution, i.e., G has the density

12 93 YONG ZHOU HAIBO ZHOU YUNBEI MA function gx =/ exp{ x }. Then Rp Rp = exp{ θ p } exp{ θ p }, R p = πσ exp{ θp } exp{ θ p /σ }, where θ p = F p. Let ξ p = Φ p. Obviously, θ p = μ + σξ p. It is easy to obtain that θ p σ φxdf x = π exp{ ξ p}. Hence, the asyptotic relative efficiency is ARE 5 p = When μ =,σ =,then ARE 6 p = ρσ exp{ θ p } + ρπσ exp{ ξ p}p p. ρ exp{ ξ p } exp{ ξ p}.5exp{ ξ p }+ρπ exp{ ξ p }p p. where ξ p = Φ p. Figures c and d illustrate the siilar properties of ARE 4 as those entioned before. In this exaple, we notice that the ean of G i.e., μ is and variance i.e., σ. In Figure c, we put μ =,theeansoff and G are of the sae, hence, ARE 5 p curve is syetric, then ARE 5 p is also iniized at around p =.5. For this oent, there is a unique iniu of ARE 5 p as those in Figure. The siilar results as Figure a can be found in this case. However, when μ =andμ =infigured,μ is no longer the sae distribution as G. As a result, although the distribution G is syetric at another point rather than μ, ARE 5 p is no longer syetric. Therefore, the ARE 5 p is skewed and its iniu is not at point.5 again. Finally, we find the sae phenoena that the ARE 5 p also depends on the ratio of saple sizes fro F and G. The bigger the ratio, the ore efficient the proposed SELE Rp. Hence, over sapling fro G will lead to iproved efficiency of the proposed estiator of ROC curve. 4 Nuerical Studies We conduct a siulation study to investigate the finite saple properties of the proposed estiator of ROC curve. Specifically, we consider the SELE Rp of ROC curve when the population distribution F has known ean μ. Two siulation exaples are perfored. In each case we consider three saple sizes, n =5,, 5. The siulation in each exaple is repeated ties. Siilarly to exaples in Section 3, we consider two different population distributions fro standard treatent group. The first one is noral distribution Nμ,σ and the second is a ixture of noral distributions Nμ,σ andnμ +,σ with the ixture coefficient α =.9. We set the second distribution for the standard treatent group as a ixture noral distribution, this is because these biodal distribution are harder to estiate using kernel

13 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 93 ethods. The population distribution G of the new treatent group is noral distribution Nμ,σ. Specifically, the distributions of two populations F and G are as follows: and F Nμ,σ, G Nμ,σ, F αnμ,σ + αnμ +,σ, G Nμ,σ, where paraeters μ,σ andμ,σ have been set the sae as those in Table. In two siulation studies, we assue that the ean μ of distribution F for standard treatent group is always known as an auxiliary inforation. Table Coparison of estiators of ROC curve based on ean square errors: SELE against LZSE and CNEE. The population distributions: F is the noral distribution Nμ,σ andg is Nμ,σ μ σ μ σ n MSE sau MSE s MSE e IARE s IARE e Note: MSE sau, MSE s and MSE e denote the ean square errors of the proposed sooth estiator SELE with auxiliary inforation here, sooth estiator LZSE proposed by Lloyd and Zhou [6] and the classic nonparaetric epirical estiator CNEE of ROC curve, respectively. IARE s denotes the reciprocal of relative efficiency of the SELE Rp against the LZSE Rp, and IARE e denotes the reciprocal of relative efficiency of the SELE Rp against the CNEE R np, i.e., ARE s = MSE s/mse sau and ARE e =MSE e/mse sau.

14 93 YONG ZHOU HAIBO ZHOU YUNBEI MA The perforance of the estiator Rp is assessed via the ean square errors MSE = n grid [ n Rw i Rw i ], grid where {w i,,,,n grid } are the grid points satisfying <w i < atwhichtheroccurve Rp is estiated. In these siulations, n grid =. The MESs of the estiators of ROC curve, Rp, Rp andrnp, are calculated to copare their perforance. We use a siple ethod for the bandwidth selection. Fro 9 and, we can derive the optial bandwidth by iniizing the MSE of soothed estiator of ROC curve. It is easy to show that the optial bandwidth h i = On /3. That is the sae orders as the optial bandwidths of estiating F x andgx. However, because this optial bandwidth involves too any unknown quantities which need to be estiated. Soe useful ethods to select the optial bandwidth can be found in [6, 8 9]. In our siulation, for siplicity, we follow the approach of Lloyd and Zhou [6], then we only eploy the optial bandwidths for estiating F x andgx to estiate ROC curve although bandwidths optiized for estiation of F and G are probably not optial for estiating Rp. Soe ethods to choose the bandwidth for estiation of G were proposed by Wand and Jones [3] and Lloyd and Zhou [6]. For the sake of convenience, we outline this ethod of choosing optial bandwidth siply here. In these siulations, we consider a global optial bandwidth for estiation of G. Thesoothed estiator G n of G has the iniu IMSE integral ean square error with the optial bandwidth h o = α /ψ where ψ = g xgxdx, andα =.56 for the standard noral kernel used throughout, i.e., kx = exp x. π Obviously, we can estiate ψ by ψ = n n ĝ Y j,h, where ĝ Y j,h is the kernel estiator of g. By the ethod proposed by Wand and Jones [3] or the ethod suggested by Lloyd and Zhou [6], the optial bandwidth ĥ for estiator ĝ Y j,h can be obtained. Hence, the optial bandwidth h o for the soothed estiator G can be obtained by plug-in ethods. Checking the asyptotic ean and variance 7 and 8 of soothed estiator F of distribution F with the auxiliary inforation, we ay show that the siilar ethod of selecting optial bandwidth of the soothed estiator G n can be applied to soothed estiator F. Tables and suarize the results of siulations. All paraeters can be found in these tables. In Tables and, MSE sau,mse s and MSE e denote, respectively, the average MSEs for the SELE with known population ean, the sooth estiator LZSE proposed by Lloyd [5] and Lloyd and Zhou [6], and the CNEE of ROC curve. ARE s and ARE e denote the reciprocal of asyptotic relative efficiency in the sense of MSE ratio of the LZSE and the CNEE of ROC curve against the proposed estiator SELE of ROC curve, respectively. All results in Tables and state that the proposed estiator is ore efficient than the LZSE and the CNEE. When n>, the asyptotic relative efficiency of the proposed estiator is higher than the other two estiators. In particular, the bigger the ratio i.e., n/ of sizes of two saples, the higher the asyptotic relative efficiency of the proposed estiator. These results

15 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 933 are expected, and siilar to those in the above section can also be found. This is because there is an auxiliary inforation for distribution F, but not for distribution G. We can iprove the efficiency of the proposed estiator by increasing ore saple fro distribution G than saple fro distribution F. Table Coparison of estiators of ROC curve based on ean square errors: SELE against LZSE and CNEE. The population distributions: F is the noral ixture of Nμ,σ andnμ +,σ, and G is Nμ,σ μ σ μ σ n MSE sau MSE s MSE e IARE s IARE e Note: MSE sau, MSE s and MSE e denote the ean square errors of the proposed sooth estiator SELE with auxiliary inforation here, sooth estiator LZSE proposed by Lloyd and Zhou [6] and the classic nonparaetric epirical estiator CNEE of ROC curve, respectively. ARE s denotes the reciprocal of relative efficiency of the SELE Rp against the LZSE Rp andare e denotes the reciprocal of relative efficiency of the SELE Rp against the CNEE R np, i.e., ARE s = MSE s/mse sau and ARE e =MSE e/mse sau. Figures 3 and 4 express that the MSE of the proposed estiator of ROC curve is saller uniforly than the other two estiators for <p<. Meanwhile, we can find that the SELE has saller bias than the LZSE and the CNEE in c, f, and i in Figures 3 and 4. 5 Conclusion Reark We develop a ethod to estiate ROC curve and to iprove the asyptotic efficiency of

16 934 YONG ZHOU HAIBO ZHOU YUNBEI MA estiation of ROC curve with the additional inforation fro the standard treatent. The epirical likelihood procedure plays a key role to deal with auxiliary inforation in constructing the soothed epirical likelihood estiator of ROC curve. The seiparaetric approach iposes a paraetric assuption on the distribution F fro standard treatent. However, when the odel is isspecified, it often results in large bias of estiation. Generally, the nonparaetric approach is ore robust as copared with the seiparaetric ethods and paraetric ethods. Since our approach is entirely based on nonparaetric odel, the proposed approach is ore robust against the odel is-specification. The proposed approach can be used to deal with other types of data. For exaple, the ethod can be extended to censored data directly. In addition, it is possible to ake other inferences under our odel assuptions. For exaple, the siilar results can be established for the horizontal two-saple quantile coparison function G F p and the vertical quantile coparison function GF p, whose graphs are the quantile-quantile plot Q-Q plot and percentile-percentile plot P-P plot, respectively. Another interesting proble is estiation of the area under ROC curve. It would be of interest to extend our approach to estiate the area under ROC curve in the fraework of both nonparaetric odel and seiparaetric odel.. a MSE b box plot of MSE c Estiator of ROC curve x 3 d MSE. MSE MSE MSE3 e box plot of MSE.5 f Estiator of ROC curve g MSE.5.4 MSE MSE MSE3 h box plot of MSE.5 i Estiator of ROC curve MSE MSE MSE3.5 Figure 3 MSE and estiation of ROC curve: The population distributions F are noral distribution Nμ,σ andg noral distribution Nμ,σ. Paraeters μ,σ,μ,σ are given, respectively,,,, in the first three plots a c for saple sizes = n = 5,,,, in the second three plots d f for saple sizes =5,n = 5,,,, in the third three plots g i for =5,= 5. Dash, dotted and dash-dotted curves are the MSEs of SELE, LZSE and CNEE, respectively, in the first colun figures a, d, and g; MSEi i =,, 3 in the second colun figures b, e, and h denotes the box-plots of MSEs of SELE, LZSE and CNEE; and solid curves in the third colun figures c, f, and i are the true ROC curve, dash, dotted and dashdotted curves are SELE, LZSE and CNEE, respectively. The X-axis in the first colun and third colun plots are p, <p<

17 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 935. a MSE.4 b box plot of MSE c Estiator of ROC curve x 3 d MSE MSE MSE MSE3 e box plot of MSE.5 f Estiator of ROC curve.5.5 g MSE..5.6 MSE MSE MSE3 h box plot of MSE.5 i Estiator of ROC curve MSE MSE MSE3.5 Figure 4 MSE and estiation of ROC curve: The population distributions F are a ixture noral distribution αnμ,σ+ αnμ +,σandg noral distribution Nμ,σ; Paraeters μ,σ,μ,σ are given, respectively,,,, in the first three plots a c for saple sizes = n = 5,,,, in the second three plots d f for saple sizes =5,n = 5,,,, in the third three plots g i for = 5. Dash, dotted and dash-dotted curves are the MSEs of SELE, LZSE and CNEE, respectively, in the first colun figures a, d, and g; MSEi i =,, 3 in the second colun figures b, e, and h denotes the box-plots of MSEs of SELE, LZSE and CNEE; and solid curves in the third colun figures c, f, and i are the true ROC curve, dash, dotted and dash-dotted curves are SELE, LZSE, and CNEE, respectively. The X-axis in the first colun and third colun plots are p, <p< References [] J.A.SwetsandR.M.Pickett,Evaluation of Diagnostic Systes: Methods fro Signal Detection Theory, Acadeic Press, New York, 98. [] G. Capbell and M. V. Ratnaparkhi, An application of Loax distributions in receiver operating characteristic ROC curve analysis, Co. Statist., 993, : [3] M. J. Goddard and I. Hinberg, Receiver operating characteristic ROC curves and non-noral data: An epirical study, Statist. Med., 99, 9: [4] F. Hsieh and B. W. Turnbull, Nonparaetric and seiparaetric estiation of the receiver operating characteristic curve, Ann. Statist., 996, 4: 5 4. [5] C. J. Lloyd, Using soothed received operating characteristic curves to suarize and copare diagnostic systes, J. Aer. Stat. Assoc., 998, 93: [6] C. J. Lloyd and Y. Zhou, Kernel estiators of the ROC curve are better than epirical, Statist. Probab. Lett., 999, 44: 8.

18 936 YONG ZHOU HAIBO ZHOU YUNBEI MA [7] L. Peng and X. Zhou, Local linear soothing of receiver operating characteristic ROC curves, J. Statist. Plann. and Infer., 4, 8: [8] H. Ren, X. Zhou, and H. Liang, A flexible ethod for estiating the ROC curve, J. Appl. Stat., 4, 3: [9] G. Qin and X. Zhou, Epirical likelihood inference for the area under the ROC curve, Bioetrics, 6, 6: [] L. E. Chabless and G. Diao, Estiation of tie-dependent area under the ROC curve for longter risk prediction, Statist. Med., 6, 5: [] M. Pepe, An interpretation for the ROC curve and inference using GLM procedures, Bioetrics, a, 56: [] T. A. Alonzo and M. S. Pepe, Distribution-free ROC analysis using binary regression techniques, Biostatistics,, 3: [3] T. Cai and M. S. Pepe, Seiparaetric receiver operating characteristic analysis to evaluate bioarkers for disease, J. Aer. Statist. Assoc.,, 97: [4] J. A. Hanley and B. J. McNeil, The eaning and use of the area under the receiver operating characteristic ROC curve, Radiology, 98, 43: [5] M. Pepe, A regression odelling fraework for receiver operating characteristic curves in edical diagnostic testing, Bioetrika, 997, 84: [6] M. Pepe, Three approaches to regression analysis of receiver operating characteristic curves for continuous test results, Bioetrics, 998, 54: [7] M. Pepe, Receiver operating characteristic Methodology, J. Aer. Stat. Assoc., b, 95: [8] X. Zhou, Coparing correlated areas under the ROC curves of two diagnostic tests in the presence of verification bias, Bioetrics, 998, 54: [9] P. J. Heagerty and Y. L. Zheng, Survival odel predicitive accuracy and ROC curves, Bioetrics, 5, 6: 9 5. [] Y. Zheng, T. Cai, and Z. Feng, Application of the tie-dependent ROC curves for prognostic accuracy with ultiple bioarkers, Bioetrics, 6, 6: [] P. S. Albert, Rando effects odeling approaches for estiating ROC curves fro repeated ordinal tests without a gold standard, Bioetrics, 7, 63: [] L. Horváth, Z. Horváth, and W. Zhou, Confidence bands for ROC curves J. Statist. Plann. Inference, 8, 38: [3] F. Hsieh, The epirical process approach for seiparaetric two-saple odels with heterogeneous treatent effect, J. R. Statist. Soc. B, 995, 57: [4] G. Li, R. C. Tiwari, and M. T. Wells, Seiparaetric inference for a quantile coparison function with applications to receiver operating characteristic curves, Bioetrika, 999, 86: [5] A. B. Owen, Epirical likelihood ratio confidence intervals for a single functional, Bioetrika, 998, 75: [6] A. B. Owen, Epirical likelihood confidence regions, Ann. Statist., 99, 8: 9. [7] J. Qin and J. Lawless, Epirical likelihood and general estiating equations, Ann. Statist., 994, : [8] P. Hall and R. J. Hyndan, Iproved ethods for bandwidth selection when estiating ROC curve, Statist. Prob. Lett., 3, 64: [9] X. Zhou and J. Harezlek, Coparison of bandwidth selection ethods for kernel soothing of ROC curves, Statistics in Medicine,, : [3] M. P. Wand and M. C. Jones, Kernel Soothing, Chapan and Hall, London, 995. [3] P. Sarda, Soothing paraeter selection for sooth distribution functions, J. Statist. Plann. and Infer., 993, 35: [3] M. Csörgö and P. Révész, Strong Approxiations in Probability and Statistics, Acadeic Press, New York, 98.

19 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 937 Appendix In this section we first give several regularity conditions on the distributions F and G, the kernel function k and auxiliary inforation φx. The condition is need to prove the ain results. Assuption The density functions fx andgx of distribution functions F x andgx arecon- tinuous at point x = θ p for <p<, and θ p = F p; f x andg x exist and are continuous at x = θ p for soe <p<; 3 kx is a syetric probability density function with x kxdx =; 4 nh 4 i asn, i =, andn/ ρ as n ; 5 Σ = E[φXφ T X] is positive definite; [ 6 E[ φx 4 ] <. Furtherore for i.i.d rando variables X i,,,,n, E sup φ ] i X i 4 <. That condition of nh 4 i i =, contains the optial bandwidth for the kernel estiator of the distribution see [3]. In fact, the optial bandwidth is h i = Cn 3, the positive constant C depending on the true functions F andg. Lea Assue that Σ = E[φXφ T X] is positive definite. Then we have λ = Σ φx i + o p n. Furtherore, if E φx 4 <, then λ = Σ and in both case λ φx i + O p n, L N, Σ. The lea can be proved by using the siilar approach as shown in Owen [6]. By a siilar arguent of Owen [6], we can obtain stronger results that the eans of residual ters in Lea have the order of On oron under Assuption 6. We first prove the bias and variance of the epirical likelihood-based kernel sooth distribution. That is, we prove the forulae of 7 and 8. Proof of 7 and 8 Using the fact λ = O p n,wehave F x = x +λ T φx i K Xi, h = F x λ T T x+λ T T xλ + o p, 7 where the vector T x andr r atrix function T x are defined by T x = x Xi φx i K h T x = x φx i φ T Xi X i K, 8 h. 9

20 938 YONG ZHOU HAIBO ZHOU YUNBEI MA Returning 7, since EφX =, it follows fro Lea that Eλ T T x = E [ Φ T XΣ ΦXIX x ] + o, where we have used the following forula that for any integrable function ϕx, [ ] x X E ϕxk = ψx+oh, where ψx = x ϕtdf t. Note that Eλ T T xλ = 3 E Φ T X i Σ j= h x φx j φ T Xj X j K h Σ l= φx l + o = I + I + I 3 + o, where I contains ters with i = j = l, I contains ters with i = j l, i = l j and i l = j, andi 3 contains the ters with i j l. It is easy to show that I 3 =since EφX = and I = ] x X [φ E T XΣ φxφ T XΣ φxk = o. At last, by, we have Hence cobining 7 with we have I = E [ φ T XΣ φxix x ] + o. To derive the variance of F x, we notice that j= h E F x =E F x+o. 3 + λ T φx i + λ T φx j K F x equals to x Xi h K x Xj h = F x λt T F x+λ T T λ F x+λt T T T λ + o p, 4 where T and T are defined in 8 and 9. By soe tedious calculation, we obtain that E λ T T F x = AT xσ Ax + E [ φ T XΣ φxix x ] F x+o 5 E λ T T λ F x = E [ φ T XΣ φxix x ] F x+o, 6

21 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 939 and Eλ T T T λ= AT xσ Ax+o. 7 Cobining 4, 5, 6, with 7, we have Hence E F x =E F x AT xσ Ax+o. 8 Var F x =Var F x AT xσ Ax +o. 9 This is the forula 8. Lea Assue that conditions 3 5 in the Appendix are satisfied, and F x is continuous differentiable at point θ p = F p for <p<. Let θ p = F p be a estiator of true upper quantile θ p of F.Then θ p θ p alost surely, n θ p θ p N,γ, where γ = θp θp f p p φ T xdf xσ φxdf x. θ p Proof Recall θ p = F p, θ p = F p. Considering the estiating equation ξθ, X = F θ +p =. Obviously θ p is the solution of this estiating equation, and ξθ, X is nondecreasing in θ because F is a distribution function. Thus, for every y, n P ξ y,x > P θp θ p /σ p y = P θp y P ξ y,x, where y = θ p + σ p y/ and σ p = γ. Thus to prove the lea, it is enough to show that for every y, li P ξ y,x > = li P ξ y,x = Φy, 3 n n where Φ is the standard noral distribution function. Note that ξ y n,x= + y X i K h λ T φx i φ T X i λ +λ T K φx i λ T y X i φx i K y X i h i h i +p. By Lea, λ = O p n iplies that the third ter of right hand side of the above forula is negligible. Again using Lea, the second ter of the right hand side of the above forula is λ T y X i φx i K = E [φxi X θ p] Σ φ T X i +o p. h

22 94 YONG ZHOU HAIBO ZHOU YUNBEI MA Write Hence, y X i Z i = K A T θ p Σ φx i p. h ξ y,x= Z + o p. To prove 3, it sufficient to show that li P Z = Φy. 3 To show 3, we need to noralize the su of i.i.d. rando variables in 3, equivalently, 3 is li P = Φy. Z i EZ i Var Zi EZ Var Z Since h 4 as,wehave EZi = f θ p σ p y + o and Var Z i =f θ p σp + o. Let T i = Z i EZ i. Since T i, i, are independent and identically distributed with ean. A standard center liit theore for i.i.d. rando variables T i iplies li P T i Var Ti y = Φy, which then iplies 3. Therefore, this copletes the proof of Lea. Lea 3 Assue that conditions 4 are satisfied. Then E G n x =Gx+ h G x x kxdx + oh, 3 Var G n x = n [ Gx G x h G x xkxkxdx + h G x Gx x kxkxdx + Oh 3 ]. 33 Proof It is easy to obtain the asyptotic ean 3 and asyptotic variance 33 of Gx for sall h using ethods illustrated in [3]. ProofofTheore The proofs are siilar to those of [5 6]. So we only give an outline. To derive the bias and asyptotic variance of Rp, we first need expressions of bias and variance of θ p = F p, which is the solution of F θ p =ξθ, X =. Using standard estiating equation theory the asyptotic variance of the solution is Var F θ p Var θp = E ξ θ p,x,

23 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 94 where θ p = F p. Since ξ θ, X = fθ, where fθ = h +λ T φx i k θ Xi has ean fθ+o h + o. It follows fro 9 and Lea 3 that Var θp = p p f θ p h h α fθ p A T xσ Ax f θ p + o. 34 where α = xkxkxdx. The bias of θ p follows fro a second order expansion of ξθ, X leading to Eξθ p,x=fθ p bias θ p + f θ p E θp θ p + o + h and we have E θ θ =Var θ p +oh and bias θ p = f θ p [ ] h fθ p +Var θ p + o + h. Since G n is independent of θ p, we can easily obtain the conditional ean and the conditional variance of Rp = G n θ p given θ p fro 3 and 33, i.e., E Rp θp = G θ p h g θ p +oh, nvar Rp θp = G θ p G θ p h g θ p α + Oh. 35 Dropping the arguent θ p, we can derive that E ] Rp = E [E Rp θp = Rp+ gf h fg h fθ p E Rp = Rp+ R p R p [ h f θ p + AT θ p Σ Aθ p This copletes the proof of. Note that { Var Rp =Var E + Var θ p fθ p gf fg +o + h. Rp θp } + E ] p p + h h g θ p + o + h. { } Var Rp θp. Now we derive the forula 9. Fro 35, it follows that ne [ Var ] Rp θp = Gθp Gθ p + gθ p [ Gθ p ] bias θp

24 94 YONG ZHOU HAIBO ZHOU YUNBEI MA Var E Rp θp =Var h α gθ p +o + h, G θ p h cov = [ g θ p +Oh ] Var θ p. G θ p,g θ p Therefore, this copletes the proof of 9. Proof of Theore Fro forulae of 9 and, we show easily that Rp Rp in probability. Considering the following estiating equation: ξ θ, X = F θ +p =. Obviously, by definition of θ p, it follows that ξ θ p,x =. Hence, we have ξ θ p,x=ξ θ p,x θ p θ p + ξ θ p,x θp θ p, 36 where θp lies between θ p and θ p. By the definition of F,wehave Eξ θ p,x=h f θ x kxdx + o h. 37 On the other hand, it follows that as n, h, θp x k df x fθ p a.s., 38 h h and fθ p >. By the siilar proof of 7, we have F x = F x λ T T x+λ T T xλ + O log log a.s., 39 where T x andt x are defined in 8 and 9. We start with proving R Rp a.s. and the second result of Theore. we can show by soe tedious proofs that λ = O n log log n a.s. 4 Fro 39 and the definition of g θ p,x,wecanobtainthat ξ θ p,x= F θ p λ T T θ p +λ T T θ p λ p+o log log a.s. Hence ξ θ p,x Eξ θ p,x= F θ p θp x K h df x λ T T θ p +λ T T θ p λ + O log log a.s. Integration by part and the law of the iterated logarith for epirical process iplies F θ θp x p K df x = O log log a.s. h

25 SMOOTH ROC ESTIMATION WITH AUXILIARY INFORMATION 943 Fro 39, it follows that λ T T θ p = O log log a.s. and λ T T θ p λ = O log log a.s. Therefore ξ θ p,x=o log log + O h a.s. 4 The derivative of ξ θ p,xis ξ θ p,x= h θp X i k h i + O log log a.s. By 38, we have ξ θ p,x fθ p a.s. Cobining 36 with 4, we have θ p θ p a.s. Therefore, Rp Rp a.s. In fact, fro the siilar proof of the strong consistency of θ p to θ p, we can easily check the proofs of 36 and 4 to show that θ p θ p = O log log a.s. 4 A siple consequence of kernel sooth estiator of Gx iplies that sup G n x Gx = On / log log n / a.s. x Then by 4, we have Rp Rp = On / log log n / a.s. Hence we coplete the proof of the second result of Theore. Proof of Theore 3 Let q = p, notethat [ Rp Rp = G F q G n F q + G n F q G F q ] and +[GF q G n F q] + [GF q G F q] = I + I + I 3. To derive the asyptotic norality of Rp, we need the following results that ni in probability, L ni N,σ I3 L N,σ, where σ = G F q [ G F q ] and [ σ = gθp θp ] θp p p φ T xdf xσ φxdf x. fθ p

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

arxiv: v1 [math.pr] 17 May 2009

arxiv: v1 [math.pr] 17 May 2009 A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall

More information

Estimation of the Population Mean Based on Extremes Ranked Set Sampling

Estimation of the Population Mean Based on Extremes Ranked Set Sampling Aerican Journal of Matheatics Statistics 05, 5(: 3-3 DOI: 0.593/j.ajs.05050.05 Estiation of the Population Mean Based on Extrees Ranked Set Sapling B. S. Biradar,*, Santosha C. D. Departent of Studies

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME J. Japan Statist. Soc. Vol. 35 No. 005 73 86 AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME H. S. Jhajj*, M. K. Shara* and Lovleen Kuar Grover** For estiating the

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes Nonlinear Log-Periodogra Regression for Perturbed Fractional Processes Yixiao Sun Departent of Econoics Yale University Peter C. B. Phillips Cowles Foundation for Research in Econoics Yale University First

More information

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression Advances in Pure Matheatics, 206, 6, 33-34 Published Online April 206 in SciRes. http://www.scirp.org/journal/ap http://dx.doi.org/0.4236/ap.206.65024 Inference in the Presence of Likelihood Monotonicity

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Best Linear Unbiased and Invariant Reconstructors for the Past Records

Best Linear Unbiased and Invariant Reconstructors for the Past Records BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY http:/athusy/bulletin Bull Malays Math Sci Soc (2) 37(4) (2014), 1017 1028 Best Linear Unbiased and Invariant Reconstructors for the Past Records

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Meta-Analytic Interval Estimation for Bivariate Correlations

Meta-Analytic Interval Estimation for Bivariate Correlations Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models Optial Jackknife for Discrete Tie and Continuous Tie Unit Root Models Ye Chen and Jun Yu Singapore Manageent University January 6, Abstract Maxiu likelihood estiation of the persistence paraeter in the

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Empirical phi-divergence test-statistics for the equality of means of two populations

Empirical phi-divergence test-statistics for the equality of means of two populations Epirical phi-divergence test-statistics for the equality of eans of two populations N. Balakrishnan, N. Martín and L. Pardo 3 Departent of Matheatics and Statistics, McMaster University, Hailton, Canada

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. Proceedings of the 2016 Winter Siulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtan, E. Zhou, T. Huschka, and S. E. Chick, eds. THE EMPIRICAL LIKELIHOOD APPROACH TO SIMULATION INPUT UNCERTAINTY

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS ACTA UIVERSITATIS LODZIESIS FOLIA OECOOMICA 3(3142015 http://dx.doi.org/10.18778/0208-6018.314.03 Olesii Doronin *, Rostislav Maiboroda ** GEE ESTIMATORS I MIXTURE MODEL WITH VARYIG COCETRATIOS Abstract.

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests Working Papers 2017-03 Testing the lag length of vector autoregressive odels: A power coparison between portanteau and Lagrange ultiplier tests Raja Ben Hajria National Engineering School, University of

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period An Approxiate Model for the Theoretical Prediction of the Velocity... 77 Central European Journal of Energetic Materials, 205, 2(), 77-88 ISSN 2353-843 An Approxiate Model for the Theoretical Prediction

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models 2014 IEEE International Syposiu on Inforation Theory Copression and Predictive Distributions for Large Alphabet i.i.d and Markov odels Xiao Yang Departent of Statistics Yale University New Haven, CT, 06511

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Simultaneous critical values for t-tests in very high dimensions

Simultaneous critical values for t-tests in very high dimensions Bernoulli 17(1, 2011, 347 394 DOI: 10.3150/10-BEJ272 Siultaneous critical values for t-tests in very high diensions HONGYUAN CAO 1 and MICHAEL R. KOSOROK 2 1 Departent of Health Studies, 5841 South Maryland

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Generalized Augmentation for Control of the k-familywise Error Rate

Generalized Augmentation for Control of the k-familywise Error Rate International Journal of Statistics in Medical Research, 2012, 1, 113-119 113 Generalized Augentation for Control of the k-failywise Error Rate Alessio Farcoeni* Departent of Public Health and Infectious

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

An Introduction to Meta-Analysis

An Introduction to Meta-Analysis An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

A Small-Sample Estimator for the Sample-Selection Model

A Small-Sample Estimator for the Sample-Selection Model A Sall-Saple Estiator for the Saple-Selection Model by Aos Golan, Enrico Moretti, and Jeffrey M. Perloff October 2000 ABSTRACT A seiparaetric estiator for evaluating the paraeters of data generated under

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Sharp sensitivity bounds for mediation under unmeasured mediator-outcome confounding

Sharp sensitivity bounds for mediation under unmeasured mediator-outcome confounding Bioetrika (2016), 103,2,pp. 483 490 doi: 10.1093/bioet/asw012 Printed in Great Britain Advance Access publication 29 April 2016 Sharp sensitivity bounds for ediation under uneasured ediator-outcoe confounding

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Non-uniform Berry Esseen Bounds for Weighted U-Statistics and Generalized L-Statistics

Non-uniform Berry Esseen Bounds for Weighted U-Statistics and Generalized L-Statistics Coun Math Stat 0 :5 67 DOI 0.007/s4004-0-009- Non-unifor Berry Esseen Bounds for Weighted U-Statistics and Generalized L-Statistics Haojun Hu Qi-Man Shao Received: 9 August 0 / Accepted: Septeber 0 / Published

More information

Comparing Probabilistic Forecasting Systems with the Brier Score

Comparing Probabilistic Forecasting Systems with the Brier Score 1076 W E A T H E R A N D F O R E C A S T I N G VOLUME 22 Coparing Probabilistic Forecasting Systes with the Brier Score CHRISTOPHER A. T. FERRO School of Engineering, Coputing and Matheatics, University

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Bayesian Approach for Fatigue Life Prediction from Field Inspection Bayesian Approach for Fatigue Life Prediction fro Field Inspection Dawn An and Jooho Choi School of Aerospace & Mechanical Engineering, Korea Aerospace University, Goyang, Seoul, Korea Srira Pattabhiraan

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Complex Quadratic Optimization and Semidefinite Programming

Complex Quadratic Optimization and Semidefinite Programming Coplex Quadratic Optiization and Seidefinite Prograing Shuzhong Zhang Yongwei Huang August 4 Abstract In this paper we study the approxiation algoriths for a class of discrete quadratic optiization probles

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information