Spiking problem in monotone regression : penalized residual sum of squares

Similar documents
Likelihood Based Inference for Monotone Response Models

1. Define the following terms (1 point each): alternative hypothesis

The International Journal of Biostatistics

Lecture 15. Hypothesis testing in the linear model

14.30 Introduction to Statistical Methods in Economics Spring 2009

Polynomial Degree and Finite Differences

with Current Status Data

A Short Introduction to the Lasso Methodology

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Likelihood Based Inference for Monotone Response Models

= 1 2 x (x 1) + 1 {x} (1 {x}). [t] dt = 1 x (x 1) + O (1), [t] dt = 1 2 x2 + O (x), (where the error is not now zero when x is an integer.

Ch 2: Simple Linear Regression

ab is shifted horizontally by h units. ab is shifted vertically by k units.

Some General Types of Tests

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

Statistics 3858 : Maximum Likelihood Estimators

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

3.5 Solving Quadratic Equations by the

Fixed-b Inference for Testing Structural Change in a Time Series Regression

Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

A732: Exercise #7 Maximum Likelihood

Title: Estimation of smooth regression functions in monotone response models

Simple Linear Regression

Piecewise Smooth Solutions to the Burgers-Hilbert Equation

Local Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes

Linear models and their mathematical foundations: Simple linear regression

Generalized Linear Models

Simple and Multiple Linear Regression

Competing statistics in exponential family regression models under certain shape constraints

Comparing two independent samples

Adjusted Empirical Likelihood for Long-memory Time Series Models

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

Robust estimation, efficiency, and Lasso debiasing

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

STRONG NORMALITY AND GENERALIZED COPELAND ERDŐS NUMBERS

Hypothesis testing for Stochastic PDEs. Igor Cialenco

STAT5044: Regression and Anova. Inyoung Kim

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

arxiv: v1 [stat.me] 13 Nov 2017

FinQuiz Notes

Least squares under convex constraint

Maximum Likelihood (ML) Estimation

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Change detection problems in branching processes

Algebra I Notes Unit Nine: Exponential Expressions

Algebra I Notes Concept 00b: Review Properties of Integer Exponents

Math 216 Second Midterm 28 March, 2013

Asymptotics for posterior hazards

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Mathematical Ideas Modelling data, power variation, straightening data with logarithms, residual plots

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction

where x and ȳ are the sample means of x 1,, x n

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Mathematics Background

Universidad Carlos III de Madrid

Non-Linear Regression Samuel L. Baker

Nonparametric estimation of log-concave densities

The Codimension of the Zeros of a Stable Process in Random Scenery

STAT Sample Problem: General Asymptotic Results

Model comparison and selection

f 0 ab a b: base f

Higher. Polynomials and Quadratics. Polynomials and Quadratics 1

Model Selection and Geometry

Problem Selected Scores

Simple Examples. Let s look at a few simple examples of OI analysis.

1. The Multivariate Classical Linear Regression Model

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement

CS 195-5: Machine Learning Problem Set 1

Nonparametric Inference In Functional Data

5.6 Asymptotes; Checking Behavior at Infinity

Introduction to Probability Theory for Graduate Economics Fall 2008

Lecture 3. Inference about multivariate normal distribution

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Central Limit Theorem ( 5.3)

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Linear Models A linear model is defined by the expression

Estimation of a Two-component Mixture Model

DA Freedman Notes on the MLE Fall 2003

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

ON THE FIRST TIME THAT AN ITO PROCESS HITS A BARRIER

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

BALANCING GAUSSIAN VECTORS. 1. Introduction

Covariance function estimation in Gaussian process regression

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Section 8.5. z(t) = be ix(t). (8.5.1) Figure A pendulum. ż = ibẋe ix (8.5.2) (8.5.3) = ( bẋ 2 cos(x) bẍ sin(x)) + i( bẋ 2 sin(x) + bẍ cos(x)).

Minimum distance tests and estimates based on ranks

Poor presentation resulting in mark deduction:

Regression, Ridge Regression, Lasso

Linear Models for Regression CS534

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

The Cameron-Martin-Girsanov (CMG) Theorem

Section 3.3 Limits Involving Infinity - Asymptotes

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

δ -method and M-estimation

Transcription:

Spiking prolem in monotone regression : penalized residual sum of squares Jayanta Kumar Pal 12 SAMSI, NC 27606, U.S.A. Astract We consider the estimation of a monotone regression at its end-point, where the least square solution is inconsistent. We penalize the least square criterion to achieve consistency. We also derive the limit distriution for the residual sum of squares, that can e used to construct confidence intervals. keywords & phrases : prolem. monotone regression, penalization, residual sum of squares, spiking 1 Introduction Monotone regression gained statistical importance in the last decades, in several scientific prolems; e.g. dose-response eperiments in iology, modeling disease incidences as function of toicity levels, industrial eperiments like the effect of temperature on the strength of steel etc. The least squares estimator (LSE henceforth), for this prolem is descried y Roertson et al (1987). We refer to Wright (1981) for its derivation and Groeneoom et al (2001) for its distriution. In many of the prolems, including the ones mentioned aove, the value of the regression function at one of the end-points of the domain of the covariate is of special interest. For eample, 1 19, T.W. Aleander Drive, Research Triangle Park, NC 27709, U.S.A., jpal@samsi.info 2 Research supported y NSF-SAMSI cooperative agreement No. DMS-0112069 1

the response level at the asence of any dose of medicine or the disease incidence at zero toicity level (known as ase level or placeo) is a quantity that needs specific attention. However, the LSE fails to e consistent at the end-points and has a systematic non-negligile asymptotic ias. This is known as the spiking prolem. Similar prolem also arises in the estimation of decreasing densities, where the maimum likelihood estimator is inconsistent at its left end-point. This has een addressed y Woodroofe and Sun (1993), using a penalization on the likelihood. Recently, Pal (2006, Chapter 5) investigates the ehavior of the likelihood ratio in presence of the penalization in the density estimation. Analogously, here we apply the penalization on the least squares criterion and investigate the estimator at the end-point and the corresponding residual sum of squares. In Section 2 we formulate the penalized least squares prolem and characterize the estimators. We state the main result regarding the convergence of the penalized residual sum of squares to a limit distriution as well. In Section 3 we present the proofs of key propositions using technical arguments. 2 Main results 2.1 Preliminaries We consider the prolem of estimating the value of an increasing regression function at its left-end point. The prolem of decreasing functions or estimation at the right end point follows similarly, y reflecting the regression on the vertical or horizontal aes. For simplicity, we restrict ourselves to the unit-interval as the support of the co-variate, and take the design points as equally spaced, viz. i = i/n. Suppose we have oservations Y 1, Y 2,..., Y n at those design points such that Y i = µ( i ) + ɛ i for i = 1,..., n, and ɛ i are independent, identically distriuted errors with zero means and common variance σ 2. Assuming that the regression function µ is increasing in [0, 1], the least square estimate (LSE) (MLE in case ɛ i are Gaussian) of µ which minimizes (Y i µ( i )) 2 2

is the Pool-Adjacent-Violator-Algorithm (PAVA henceforth) estimator µ( k ) = min ma Y r +... + Y s s k r k s r + 1 Define µ i = µ( i ) for i 1, and µ i = µ( i ). Then µ(0+) = µ( 1 ) = min r 1 (Y 1 +... + Y r )/r is inconsistent for µ(0+). In particular P ( µ(0+) µ(0+)) 1, always yielding a negative nonnegligile ias. 2.2 Penalization on the least square One way to counter the prolem is to penalize for too low estimate of µ(0+). In particular, we look at the penalized least square (Y i µ i ) 2 2nαµ 1 (1) where α = α n is a penalty parameter that depends on the sample size n. For simplicity, we make that dependence implicit. To start with, we define the Cusum Diagram F of the data as follows. Let F r = (Y 1 +...+Y r )/n for r = 1,..., n. F is the piecewise linear interpolant of F 1,..., F n in the interval [0, 1] such that F (i/n) = F i. The following characterization of the penalized least square estimator, denoted ˆµ, is similar to the estimator of Woodroofe and Sun (1993). The proof is presented in Section 3. Proposition 1 ˆµ, the minimizer of 1 can e characterized as, F ()+α min 0 1, if k m 0 ˆµ( k ) = µ( k ), if k > m 0 where m 0 = n argmin 0 1 (F () + α)/. In particular ˆµ(0+) = min r F r+α r/n. The goal is to construct confidence sets for µ(0+), and for that we need sampling distriutions for the estimate ˆµ(0+). The estimator achieves consistency and its limit distriution is derived susequently. Alternately, we can use the residual sum of squares. However, the regular RSS is not relevant here ecause of the penalization. Under the constraint µ(0+) = c, a specific value, the penalized sum of square will increase, and reflect the effect of µ(0+). Hence, we focus on the 3

difference etween the constrained and unconstrained penalized sum of squares, and conclude that a large difference indicates a lack of evidence in favor of the hypothesis µ(0+) = c. To e more [ n ] [ precise, we measure Q = min µ,µ1 =c (Y n ] i µ i ) 2 2nαc min µ (Y i µ i ) 2 2nαµ 1 as a test statistic for that hypothesis. To find the constrained (penalized) least square solution, one would try to minimize the quantity n (Y i µ i ) 2 under the constraint c = µ 1 µ 2... µ n. Taking β as a Lagrange multiplier, we minimize (Y i µ i ) 2 2nβµ 1 (2) suject to µ 1 = c. The following characterization, similar to the penalized LSE, can e proved using similar arguments. See Section 3, for a proof. Lemma 1 ˆµ c, the minimizer of (2), can e characterized as, ˆµ c c, if k s 0 ( k ) = µ( k ), if k > s 0 where s 0 = n argmin 0 1 (F () c). For convenience, we define, s0 = s 0 /n and m0 = m 0 /n. 2.3 Limit distriution for the estimators We state the main result related to the asymptotic ehavior of the LSE and the RSS (oth penalized) in the following theorem. First, we need a few definitions. Define, G(t) = B(t) + t 2, and G 1 (t) = (G(t) + 1)/t, where B is a standard Brownian motion and let Z(t) e the right derivative process of the Greatest Conve Minorant (GCM) of the process G(t). Further, we define X = argming, X 1 = argming 1 and U 1 = inf G 1. Let, X 1 U 2 1 J = + X X 1 Z 2 (t)dt, if X 1 X X 1 U 2 1 X 1 X Z2 (t)dt, if X < X 1 We state the main result as follows. The proof is presented in Section 3. 4

Theorem 1 Suppose µ (0+) > 0 and define a = µ (0+)/2. Under the penalization α = (σ 2 /n 2 a) 1/3, we have, n 1/3 (ˆµ(0+) c) (aσ 2 ) 1/3 U 1 Q σ 2 J where denotes convergence in distriution. In the following, we state some preparatory results, with their proofs deferred until Section 3. They will e essential in showing Theorem 1. To investigate the large sample properties of Q, we need to derive limits of ˆµ and ˆµ c respectively. Equivalently, we look at the joint limit distriution of ( m0, ˆµ(0+), s0, µ). Our net proposition provides the result. Proposition 2 Suppose a and α are defined as in Theorem 1. We define a scaled version of the LSE, viz. Z n (t) = n 1/3 ( µ((σ 2 /na 2 ) 1/3 t) c). Then, as n, n 1/3 (ˆµ(0+) c) (aσ 2 ) 1/3 U 1 n 1/3 m0 (σ n 1/3 2 /a 2 ) 1/3 X 1 s0 (σ 2 /a 2 ) 1/3 X {Z n (t)} t 0 {Z(t)} t 0. The proof is deferred until the net Section. The last step efore proving Theorem 1 is to epress Q in terms of ( m0, ˆµ(0+), s0, µ), or equivalently (m 0, ˆµ(0+), s 0, µ). The proof of the following Lemma requires algeraic manipulation and several properties of the estimator µ. Lemma 2 Q can e epressed as follows : Q = m 0 (ˆµ(0+) c) 2 + s 0 ( µ i c) 2, if m 0 < s 0 m 0 (ˆµ(0+) c) 2 m 0 ( µ i c) 2, if s 0 m 0 Using Proposition 2 and Lemma 2, we can deduce Theorem 1 using continuous mapping theorem etc. The details are sketched in the proof in the net Section. 5

2.4 Discussion Theorem 1 shows consistency for the penalized LSE and gives the limit distriution for oth LSE and the RSS. This is analogous to the asymptotic normality of the usual LSE and the χ 2 convergence of the corresponding RSS in the regular parametric models. Moreover, this can e compared to the corresponding results in case of monotone regression at an interior point. There, the LSE converges to a non-normal distriution at a rate n 1/3, and the RSS or the likelihood ratio converges to a universal distriution different from χ 2. See, e.g. Banerjee and Wellner (2001). However, at the end-point, we get a completely different distriution, and the effect of µ (0+) can not e neglected. However, it should e noted that we need a consistent estimator of σ 2 as well as µ (0+) to achieve a confidence interval. Both of them are relatively tricky. Meyer and Woodroofe (2000) discusses a method that yields consistent estimator for σ 2 using the LSE µ. The estimation of µ (0+) can e done using similar techniques as in density estimation, discussed y Pal (2006). 3 Proofs 3.1 Proof of Proposition 1 We oserve, n (Y i µ i ) 2 2nαµ 1 = n (Ỹi µ i ) 2 2nαY 1 n 2 α 2 where Ỹ1 = Y 1 + nα and Ỹ i = Y i for i 2. Therefore the minimizer ˆµ is the PAVA estimator with new data Ỹ1, Y 2,..., Y n i.e. ˆµ k = min s k ma r k (Ỹr +... + Ỹs)/(s r + 1). Therefore, ˆµ(0+) = min r 1 = min r 1 = min r 1 = min 0 1 Ỹ 1 +... + Ỹr r Y 1 +... + Y r + nα r n(f r + α) r F () + α Moreover, ecept for the first lock (i.e. until the point where ˆµ i = ˆµ 1 ), the contriution of Ỹ1 does not influence the estimator ˆµ, and hence ˆµ k = µ k. Let m 0 denote the first reak-off point of the 6

estimator ˆµ. It is characterized as m 0 = argmin r 1 Y 1 +... + Y r + nα r = argmin n(f r + α) r = n argmin 0 1 F () + α The proposition follows. 3.2 Proof of Lemma 1 As in Proposition 2, ˆµ c is the PAVA estimator of a new data set Y 1 + n ˆβ, Y 2,..., Y n where ˆβ solves the equation ˆµ c 1 = min s 1 Y 1 +... + Y s + nβ s = c Therefore, ˆβ = ma s 1 (cs/n F s ) = ma 0 1 (c F ()). To see this, oserve that min s 1 Y 1 +... + Y s + n ˆβ s Y 1 +... + Y s + n ˆβ s c = c for all s with equality at least once, nf s + n ˆβ cs for all s with equality at least once, ˆβ cs n F s for all s with equality at least once, and the definition of ˆβ follows. Moreover, ˆµ c = c until its first reak-off point s 0, and equal to µ eyond that. Clearly, s 0 = argmin(f s cs/n) = n argmin 0 1 (F () c). 3.3 Proof of Proposition 2 Define the step function Λ n () = n /n. To prove the result, we invoke the Hungarian emedding for the limit of partial sums of i.i.d. random variales. See, e.g. Karatzas & Shreve (1991) for a discussion. Define the partial sums S r = ɛ 1 +... + ɛ r for r = 1,..., n. Assume that ɛ i s have finite moment generating function and common variance σ 2. Then, we can ascertain the eistence of a standard 7

Brownian Motion B n for each n, and a sequence of random variales T n having the same distriution as S n such that, sup T nt σ nb n (t) = O(log n) t almost surely. This result is also known as the strong approimation. Since we are looking at a distriution convergence, we treat T n and S n likewise, and replace one of them y other. We oserve that, F () = 1 n n n Y i + O p ( 1 n ) = 1 [µ( i n n ) + ɛ i] + O p ( 1 n ) = = = 0 0 0 µ(t)dλ n (t) + 1 n S nλ n() + O p ( 1 n ) µ(t)dt + σ n B n () + O p ( 1 n ) + O(log n n ) (µ(0) + tµ (0) + t2 2 µ (t ))dt + σ n B n () + O p ( 1 n ) + O(log n n ) = c + a 2 + o( 2 ) + σ n B n () + O p ( 1 n ) + O(log n n ) = c + a 2 + σ n B n () + R n () where the remainder R n () is negligile for calculations. Define = (σ 2 /na 2 ) 1/3 and B n(t) = B n (t)/. Then, m0 = argmin F () + α [c + a 2 + σ n B n () + α + R n ()] = argmin 0 1 = argmin 0 t 1 [c + a2 t 2 + σ n B n (t) + α] + o p () taking = t t = argmin 0 t 1 [a 2 t 2 + σ n B n(t) + α Using α = (σ 4 /n 2 a) 1/3, so that a 2 = σ /n = α, we conclude, m0 = ( σ2 na 2 ) 1 3 argmin0 t ( na 2 σ 2 ) 1 3 t 8 ] + o p (n 1/3 ) [ t 2 + B n(t) + 1 t ] + o p (n 1/3 )

and therefore, Moreover, i.e. ˆµ(0+) c = min = min n 1 3 m0 ( σ2 a 2 ) 1 3 X1 F () + α c a 2 + σ n B n () + α + R n () = min a t2 + B n(t) + 1 + o p (n 1/3 ) 0 t 1 t Consider the constrained estimator now. First, so that n 1 3 (ˆµ(0+) c) (aσ 2 ) 1 3 U1 s0 = argmin(f () c) = argmin [a 2 + σ ] B n () + R n () n = ( σ2 na 2 ) 1 3 argmint (t 2 + B n(t)) + o p (n 1 3 ) n 1 3 s0 ( σ2 a 2 ) 1 3 X. We look at the ehavior of µ, properly scaled, in the region ( s0, m0 ). We rememer that it is the right-derivative of the Greatest Conve Minorant of the Cusum diagram F () at the point = s. Approimating F () y a 2 + c + σ/ nb n () in the interval, we write, µ(s) = d d (GCM(a2 + c + ( σ n )B n ())) =s Using the scaling = t as efore, and the fact that the GCM of a process added with a linear term is same as the linear term added with the GCM of the process itself, we get, µ(t) c = 1 d dt (GCM(a2 t 2 + σ B n n(t))) = a d dt (GCM(t2 + B n(t))) since a 2 = σ n = ( aσ2 n ) 1 d 3 dt (GCM(t2 + B n(t))) 9

which yields, n 1 3 ( µ(t( σ 2 na 2 ) 1 3 ) c) (aσ 2 ) 1 3 Z(t) The remainders in all these calculations are negligile as we look at points those are at a distance O p (n 1/3 ) from zero. Moreover, since the individual distriutions are derived as limit of the same Brownian motion B n, the joint convergence follows. 3.4 Proof of Lemma 2 Proof : The following identities are useful to prove the Lemma. Y i = µ i = m 0 ˆµ 1 nα (A) Y i = µ i = cs 0 nβ (B) If m 0 < s 0 Y i = µ i Y i = µ i (C1) µ 2 i (C2) The first part of (A) follows from the properties of the PAVA estimator, which is constant over each lock, and equals the average of the Y -values in that lock. The second part follows similarly, since m 0 is the first reak-off point for Ỹ1, Y 2 etc, and therefore ˆµ 1 = (Ỹ1 +... + Y m0 )/m 0 = ( m 0 Y i + nα)/m 0. (B) and (C1) follows similarly. (C2) follows y splitting the summation over locks first, and then taking the sum of the Y -s in each lock. In case s 0 m 0, the equalities (C1) 10

and (C2) remain still valid, with the role of m 0 and s 0 eing interchanged. Now, Q = min µ,µ 1 =c = = [ (Y i µ i ) 2 2nαµ 1 ] min µ (Y i ˆµ c i) 2 2nαc s 0 (Y i c) 2 + [ ] (Y i µ i ) 2 2nαµ 1 (Y i ˆµ i ) 2 + 2nαˆµ 1 (Y i µ i ) 2 (Y i ˆµ 1 ) 2 (Y i µ i ) 2 + 2nα(ˆµ(0+) c) We use the identities (A) etc in the following two cases. Case 1: m 0 < s 0 Q = 2nα(ˆµ(0+) c) + [(Y i c) 2 (Y i ˆµ 1 ) 2 ] + = 2nα(ˆµ(0+) c) + 2(ˆµ 1 c) Y i + m 0 [c 2 ˆµ 2 1] + = 2m 0 (ˆµ 1 c)ˆµ 1 + m 0 [c 2 ˆµ 2 1] + = m 0 (ˆµ(0+) c) 2 + Case 2: s 0 m 0 Q = 2nα(ˆµ 1 c) + = 2nα(ˆµ 1 c) + 2(ˆµ 1 c) ( µ i c) 2 [(Y i c) 2 (Y i ˆµ 1 ) 2 ] + Y i + s 0 [c 2 ˆµ 2 1] + s 0 [(Y i c) 2 (Y i µ i ) 2 ] [c 2 2c µ i µ 2 i + 2 µ 2 i ] m 0 [c 2 2cY i µ 2 i + 2 µ i Y i ] [(Y i µ i ) 2 (Y i ˆµ 1 ) 2 ] = 2nα(ˆµ 1 c) + 2(ˆµ 1 c)(cs 0 nβ) + s 0 [c 2 ˆµ 2 1] (m 0 s 0 )ˆµ 2 1 + [ µ 2 i 2 µ 2 i ] + 2ˆµ 1 µ i = 2(ˆµ 1 c)(nα + cs 0 nβ + = m 0 (ˆµ(0+) c) 2 m 0 ( µ i c) 2, µ i ) + s 0 c 2 m 0 ˆµ 2 1 ( since [ µ 2 i ˆµ 2 1 2 µ i Y i + 2ˆµ 1 Y i ] ( µ i c) 2 + c 2 (m 0 s 0 ) µ i = m 0 ˆµ 1 nα cs 0 + nβ) 11

The lemma follows. 3.5 Proof of Theorem 1 To start with, let s 0 m 0. Then, m 0 ( µ i c) 2 = n m0 s0 ( µ(s) c) 2 dλ n (s) = n m0 s0 ( µ(s) c) 2 ds + o p (1), since oth m0 and s0 are of order n 1/3. Similarly for m 0 < s 0, we have s0 ( µ i c) 2 = n s0 m0 ( µ(s) c) 2 ds + o p (1). From Proposition 2, (na 2 /σ 2 ) 1/3 ( m0, s0 ) converges weakly to (X 1, X) where the limit processes are functions of the same standard Brownian Motion B(t). Therefore, s0 n ( µ(s) c) 2 ds = n m0 = s 0 m 0 s 0 m 0 σ 2 X ( µ(t) c) 2 dt [n 1 σ 2 ] 2(n 3 ( µ(t( na 2 ) 1 1 3 ) c) 3 )dt X 1 Z 2 (t)dt Similarly, the same result holds when on the set s0 m0, with their roles eing interchanged. Finally, we recall that, m 0 (ˆµ(0+) c) 2 = n 1 3 m 0 [n 1 3 (ˆµ(0+) c) ] 2 ( σ2 a 2 ) 1 3 X1 (aσ 2 ) 2 3 U 2 1 = σ 2 X 1 U 2 1 Therefore, using Lemma 2, X Q σ 2 (X 1 U 2 1 + Z 2 (t)dt) = σ 2 J X 1 where the integral changes sign when X 1 > X. 12

Acknowledgments The research is supported y NSF-SAMSI cooperative agreement No. DMS-0112069, during postdoctoral research in SAMSI. Discussions with Professor Moulinath Banerjee have een instrumental for the inception of the prolem. The author also acknowledges the contriution of Professor Michael Woodroofe as the doctoral thesis supervisor. References [1] Banerjee, M. and Wellner, J. A. (2001), Likelihood Ratio Tests for Monotone Functions, Ann. Statist. 29, 1699-1731. [2] Groeneoom, P. and Wellner, J. A. (2001), Computing Chernoff s distriution, Journal of Computational & Graphical Statistics 10, 388-400. [3] Karatzas, I. and Shreve, S. E. (1991), Brownian motion and Stochastic calculus (Springer- Verlag, New York) [4] Meyer, M. and Woodroofe, M. B. (2000), On the degrees of freedom of shape restricted regression. Ann. Statist. 28, 1083-1104. [5] Pal, J. K. (2006), Statistical analaysis and inference in shape-restricted prolems with applications to astronomy (Doctoral Dissertation, University of Michigan). [6] Roertson, T., Wright, F. T. and Dykstra, R. L. (1987), Order Restricted Statistical Inference (John Wiley, New York). [7] Woodroofe, M. B. and Sun, J. (1993), A Penalized Maimum Likelihood estimate of f(0+) when f is non-increasing, Statist. Sinica 3 501-515. [8] Wright, F. T. (1981), The asymptotic ehavior of monotone regression estimates, Ann. Statist. 9, 443-448. 13