Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function

Similar documents
Model Selection and Geometry

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS

NON ASYMPTOTIC MINIMAX RATES OF TESTING IN SIGNAL DETECTION

Can we do statistical inference in a non-asymptotic way? 1

Multivariate Regression

An introduction to some aspects of functional analysis

Introduction and Preliminaries

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Lecture Notes 1: Vector spaces

The Skorokhod reflection problem for functions with discontinuities (contractive case)

Linear Algebra Massoud Malek

STAT 200C: High-dimensional Statistics

Lebesgue Measure on R n

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra

Discussion of Hypothesis testing by convex optimization

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

THE STRUCTURE OF 3-CONNECTED MATROIDS OF PATH WIDTH THREE

LECTURE 5 HYPOTHESIS TESTING

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

Boolean Inner-Product Spaces and Boolean Matrices

On John type ellipsoids

Optimization and Optimal Control in Banach Spaces

Hierarchy among Automata on Linear Orderings

Some functional (Hölderian) limit theorems and their applications (II)

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

NEW FUNCTIONAL INEQUALITIES

Reconstruction from Anisotropic Random Measurements

Optimal Estimation of a Nonsmooth Functional

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Some Background Material

Supremum of simple stochastic processes

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Stochastic Convergence, Delta Method & Moment Estimators

The small ball property in Banach spaces (quantitative results)

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Lecture 7 Introduction to Statistical Decision Theory

CHAPTER 6. Representations of compact groups

The Algebra of Tensors; Tensors on a Vector Space Definition. Suppose V 1,,V k and W are vector spaces. A map. F : V 1 V k

On Schmidt and Summerer parametric geometry of numbers

Lebesgue Measure on R n

Estimation of a quadratic regression functional using the sinc kernel

The Canonical Gaussian Measure on R

Elements of Convex Optimization Theory

arxiv: v1 [math.st] 10 May 2009

1.1. MEASURES AND INTEGRALS

1 Directional Derivatives and Differentiability

arxiv: v1 [math.pr] 22 May 2008

The Block Criterion for Multiscale Inference About a Density, With Applications to Other Multiscale Problems

2014:05 Incremental Greedy Algorithm and its Applications in Numerical Integration. V. Temlyakov

A SYNOPSIS OF HILBERT SPACE THEORY

A SHORT INTRODUCTION TO BANACH LATTICES AND

Linear Regression and Its Applications

High-dimensional regression with unknown variance

A note on some approximation theorems in measure theory

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Statistical inference on Lévy processes

Class 2 & 3 Overfitting & Regularization

Optimal stopping for non-linear expectations Part I

[y i α βx i ] 2 (2) Q = i=1

TYPICAL POINTS FOR ONE-PARAMETER FAMILIES OF PIECEWISE EXPANDING MAPS OF THE INTERVAL

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Stochastic Design Criteria in Linear Models

UNIVERSITÄT POTSDAM Institut für Mathematik

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Statistical Hypothesis Testing

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

Improving linear quantile regression for

Gaussian Processes. 1. Basic Notions

Analysis-3 lecture schemes

Nonparametric regression with martingale increment errors

Optimization Theory. A Concise Introduction. Jiongmin Yong

Some notes on Coxeter groups

1 Inner Product Space

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Chapter 2. Vectors and Vector Spaces

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Estimates for probabilities of independent events and infinite series

An introduction to Mathematical Theory of Control

Banach Spaces II: Elementary Banach Space Theory

ON THE NUMBER OF COMPONENTS OF A GRAPH

Sample Spaces, Random Variables

Chapter 3 Transformations

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Algebras of singular integral operators with kernels controlled by multiple norms

MAT 257, Handout 13: December 5-7, 2011.

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

EE731 Lecture Notes: Matrix Computations for Signal Processing

Distributionally Robust Discrete Optimization with Entropic Value-at-Risk

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Only Intervals Preserve the Invertibility of Arithmetic Operations

2. Function spaces and approximation

6. Duals of L p spaces

REVIEW OF ESSENTIAL MATH 346 TOPICS

CHAPTER 7. Connectedness

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017

Testing Algebraic Hypotheses

The Rademacher Cotype of Operators from l N

Transcription:

Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function By Y. Baraud, S. Huet, B. Laurent Ecole Normale Supérieure, DMA INRA, Laboratoire de Biométrie. Université Paris-Sud. In this paper we propose a general methodology, based on multiple testing, for testing that the mean of a Gaussian vector in R n belongs to a convex set. We show that the test achieves its nominal level, and characterize a class of vectors over which the tests achieve a prescribed power. In the functional regression model, this general methodology is applied to test some qualitative hypotheses on the regression function. For example, we test that the regression function is positive, increasing, convex, or more generally, satisfies a differential inequality. Uniform separation rates over classes of smooth functions are established and a comparison with other results in the literature is provided. A simulation study evaluates some of the procedures for testing monotonicity. 1) 1. Introduction. 1.1. The statistical framework. We consider the following regression model: Y i = F x i ) + σε i, i = 1,..., n where x 1 < x 2 <... < x n are known deterministic points in [0, 1], σ is an unknown positive number and ε i ) i=1,...,n is a sequence of i.i.d. unobservable standard Gaussian random variables. From the observation of Y = Y 1,..., Y n ), we consider the problem of testing that the regression function F belongs to one of the following functional sets K: 2) 3) 4) 5) K 0 = {F : [0, 1] R, F is non-negative } K = {F : [0, 1] R, F is non-decreasing } K = {F : [0, 1] R, F is non-concave } { d r } K r,r = F : [0, 1] R, x [0, 1], [Rx)F x)] 0. dxr In the above definition of K r,r, r denotes a positive integer and R a smooth, nonvanishing function from [0, 1] into R. Choosing the function R equal to 1 leads to Received AMS 2000 subject classifications. Primary 62G10; Secondary 62G20. Key words and phrases. tests of qualitative hypotheses, non-parametric test, test of positivity, test of monotonicity, test of convexity, rate of testing, Gaussian regression 1

2 Y. BARAUD, S. HUET, B. LAURENT test that the derivative of order r is positive. Taking r = 1 and choosing a suitable function R leads to test that a positive function F is decreasing at some prescribed rate. It is also possible to test that F belongs to some classes of smooth functions. These testing hypotheses will be detailed in Section 3. The problem is therefore to test some qualitative hypothesis on F. We shall show that it actually reduces to testing that the mean of the Gaussian vector Y belongs to a suitable convex subset of R n. Denoting by <, > the inner product of R n, this convex subset takes the form C = {f R n, j {1,..., p} < f, v j > 0}, where the vectors {v 1,..., v p } are linearly independent in R n. The aim of this paper is to present a general methodology for the problem of testing that f belongs to C and to characterize a class of vectors over which the tests achieve a prescribed power. This general methodology is applied to test that the regression function F belongs to one of the sets K. For the procedures we propose, the least-favorable distribution under the null hypothesis is achieved for F = 0 and σ = 1. Consequently, by carrying out simulations, we easily obtain tests that achieve their nominal level for fixed values of n. Moreover, we show that these tests have good properties under smooth alternatives. For the problem of testing positivity, monotonicity and convexity, we obtain tests based on the comparison of local means of consecutive observations. A precise description of these tests is given in Section 2. For the problem of testing monotonicity, our methodology also leads to tests based on the slopes of regression lines on short intervals, as explained in Section 3.1. These procedures, based on running gradients, are akin to those proposed by Hall and Heckman 2000). For the problem of testing that F belongs to K r,r with a non-constant function R we refer the reader to Section 3.2. We have delayed the description of the general methodology for testing that f belongs to C to Section 4. Simulations studies for testing monotonicity are shown in Section 5. The proofs are postponed to Sections 6 to 10. 1.2. An overview of the literature. In the literature, tests of monotonicity have been widely studied in the regression model. The test proposed by Bowman et al. 1998) is based on a procedure described in Silverman 1981) for testing unimodality of a density. This test is not powerful when the regression is flat or nearly flat as underlined by Hall and Heckman 2000). Hall and Heckman 2000) proposed a procedure based on running gradients over short intervals for which the leastfavorable distribution under the null, when σ is known, corresponds to the case where F is identically constant. The test proposed by Gijbels et al. 2000) is based on the signs of differences between observations. The test offers the advantage not to depend on the error distribution when it is continuous. Consequently, the nominal level of the test is guaranteed for all continuous error distributions. In the functional regression model with random x i s, the procedure proposed by Ghosal et al. 2001) is based on a locally weighted version of Kendall s tau. The procedure uses a kernel smoothing with a particular choice of the bandwidth and as in Gijbels et al. 2000) depends on the signs of the quantities Y j Y i )x i x j ). They show that for

TESTING CONVEX HYPOTHESES 3 certain local alternatives the power of their test tends to 1. Some comments on the power of our test under those alternatives can be found in Section 3.3. In Baraud, Huet and Laurent 2003) we propose a procedure which aims at detecting discrepancies with respect to the L 2 µ n )-distance where µ n = n 1 n i=1 δ x i. This procedure generalizes that proposed in Baraud, Huet and Laurent 2003) for linear hypotheses. A common feature of the present paper with these two lies in the fact that the proposed procedures achieve their nominal level and a prescribed power over a set of vectors we characterize. In the Gaussian white noise, Juditsky and Nemirovski 2000) propose to test that the signal belongs to the cone of nonnegative, non-decreasing or non-concave functions. For a given r [1, + [, their tests are based on the estimation of the L r -distance between the signal and the cone. However, this approach requires that the signal have a known smoothness under the null. In the Gaussian white noise model, other tests of such qualitative hypotheses are proposed by Dümbgen and Spokoiny 2001). Their procedure is based on the supremum over all bandwidths of the distance in sup-norm between a kernel estimator and the null hypothesis. They adopt a minimax point of view to evaluate the performances of their tests and we adopt the same in Sections 2 and 3. 1.3. Uniform separation rates and optimality. Comparing the performances of tests naturally arises in the problem of hypothesis testing. In this paper, we shall mainly describe the performances of our procedures in terms of uniform separation rates over classes of smooth functions. Given β in ]0, 1[, a class of smooth functions F and a distance.) to the null hypothesis, we define the uniform separation rate of a test Φ over F, denoted by ρφ, F, ), as the smallest number ρ such that the test guarantees a power not smaller than 1 β for all alternatives F in F at distance ρ from the null. More precisely, 6) ρφ, F, ) = inf {ρ > 0, F F, F ) ρ P F Φ rejects ) 1 β}. In the regression or Gaussian white noise model, the word rate refers to the asymptotics of ρφ, F, ) = ρ τ Φ, F, ) with respect to a scaling parameter τ the number of observations n in the regression model, the level of the noise in the Gaussian white noise). Comparing the performances of two tests of the same level amounts to comparing their uniform separation rates the smaller the better). A test is said to be optimal if there exists no better test. The uniform separation rate of an optimal test is called the minimax separation rate. In the sequel, we shall enlarge this notion of optimality by saying that a test is rate-optimal over F if its uniform separation rate differs from the minimax one by a bounded function of τ. Unfortunately, not much is known on the uniform separations rates of the tests mentioned in Section 1.2. The only exception we are aware of concerns the tests proposed by Dümbgen and Spokoiny 2001) and Juditsky and Nemirovski 2000) in the Gaussian white noise model with τ = 1/ n), and Baraud, Huet and Laurent 2003) in the regression model. The rates obtained by Juditsky and Nemirovski 2000) are established for the problem of testing that F belongs to K H where H is a class of smooth functions. In contrast, in the paper by Baraud, Huet and Laurent 2003) and Dümbgen and Spokoiny 2001), the null hypothesis is not restricted to those smooth functions belonging to K. For the problem of testing positivity and monotonicity,

4 Y. BARAUD, S. HUET, B. LAURENT Baraud, Huet and Laurent 2003) established separation rates with respect to the L 2 µ n )-distance to the null. For the problem of testing positivity, monotonicity and convexity, Dümbgen and Spokoiny 2001) considered the problem of detecting a discrepancy to the null in sup-norm. For any L > 0, their procedures are proved to achieve the optimal rate L logn)/n) 1/3 over the class of Lipschitz functions H 1 L) = {F, x, y [0, 1], F x) F y) L x y }. The optimality of this rate derives from the lower bounds established by Ingster 1993)[Section 2.4] for the more simple problem of testing F = 0 against F 0 in sup-norm. More generally, it can easily be derived from Ingster s results see Proposition 2) that the minimax separation rate in sup-norm) over Hölderian balls 7) H s L) = {F, x, y [0, 1], F x) F y) L x y s } with s ]0, 1] is bounded from below up to a constant) by L 1/s logn)/n ) s/1+2s). In the regression setting, we propose tests of positivity, monotonicity and convexity whose uniform separation rates over H s L) achieve this lower bound whatever the value of s ]0, 1] and L > 0. We only discuss the optimality in the minimax sense over the Hölderian balls H s L) with s ]0, 1] and L > 0. To our knowledge, the minimax rates over smoother classes of functions remains unknown and it is beyond the scope of this paper to describe them. For the problem of testing monotonicity or convexity, other choices of distances to the null are possible. For example the distance in sup-norm between the first respectively the second) derivative of F and the set of non-negative functions. For such choices, Dümbgen and Spokoiny also provided uniform separation rates for their tests. In the regression setting, the uniform separation rates we get coincide with their separation rates on the classes of functions they considered. We do not know whether these rates are optimal or not neither in the Gaussian white noise model nor in the regression one. 2. Tests based on local means for testing positivity, monotonicity and convexity. We consider the regression model given by 1) and propose tests of positivity, monotonicity and convexity for the function F. We first introduce some partitions of the design points and notations that will be used throughout the paper. 2.1. Partition of the design points and notations. We first define an almost regular partition of the set of indices {1,..., n} into l n sets as follows: for each k in {1,..., l n } we set { J k = i {1,..., n}, k 1 < i l n n k } l n and define the partition as J ln = {J k, k = {1,..., l n }}.

TESTING CONVEX HYPOTHESES 5 Then for each l {1,..., l n }, we make a partition of {1,..., n} into l sets by gathering consecutive sets J k. This partition is defined by J l = J 8) j l = J k, j = 1,..., l. We shall use the following notations. j 1 l < k ln j l We use a bold type style for denoting the vectors of R n. We endow R n with its Euclidean norm denoted by. For v R n, let v = max 1 i n v i. For a linear subspace V of R n, Π V denotes the orthogonal projector onto V. For a R +, D N\{0} and u [0, 1], Φ 1 u) and χ 1 D,a 2 u) denote the 1 u quantile of respectively a standard Gaussian random variable and a non-central χ 2 with D degrees of freedom and non-centrality parameter a 2. For x R, [x] denotes the integer part of x. For each R n -vector v and subset J of {1,..., n}, we denote by v J the R n - vector whose coordinates coincide with those of v on J and vanish elsewhere. We denote by v J the quantity i J v i/ J. We denote by 1 the R n -vector 1,..., 1) and by e i the ith vector of the canonical basis. We define V n,cste as the linear span of { 1 J, J J ln }. Note that the dimension of V n,cste equals l n. The vector ε denotes a standard Gaussian variable in R n. We denote by P f,σ the law of the Gaussian vector in R n with expectation f and covariance matrix σ 2 I n, where I n is the n n identity matrix. We denote by P F,σ the law of Y under the model defined by Equation 1). The level α of all our tests is chosen in ]0, 1/2[. 2.2. Test of positivity. We propose a level α-test for testing that F belongs to K 0 defined by 2). The testing procedure is based on the fact that if F is nonnegative, then for any subset J of {1,..., n}, the expectation of ȲY J is non-negative. For l {1,..., l n }, let T1 l Y ) be defined as T l 1 Y ) = max J J l J ȲY J n ln, Y Π Vn,cste Y and let q 1 l, u) be the 1 u quantile of the random variable T1 l ε). We introduce the test statistic { T α,1 = max T l 9) 1 Y ) q 1 l, u α ) }, l {1,...,l n}

6 Y. BARAUD, S. HUET, B. LAURENT where u α is defined as { 10) u α = sup u ]0, 1[, P max l {1,...,l n} { T l 1 ε) q 1 l, u) } ) } > 0 α. We reject that F belongs to K 0 if T α,1 is positive. Comment. When l increases from 1 to l n, the cardinality of the sets J J l decreases. We thus take into account local discrepancies to the null hypothesis for various scales. 2.3. Testing monotonicity. We now consider the problem of testing that F belongs to K defined by 3). The testing procedure lies on the following property: if I and J are two subsets of {1, 2,..., n} such that I is on the left of J and if F K, then the expectation of the difference ȲY I ȲY J is non-positive. For l {2,..., l n }, let T l 2 Y ) be defined as where T l 2 Y ) = max 1 i<j l N l ij ȲY J l i ȲY J l j n ln, Y Π Vn,cste Y ) 1/2 Nij l 1 = Ji l + 1 Jj l, and let q 2 l, u) be the 1 u quantile of the random variable T2 l ε). We introduce the test statistic { T α,2 = max T l 11) 2 Y ) q 2 l, u) }, l {2,...,l n} where u α is defined as { 12) u α = sup u ]0, 1[, P max l {2,...,l n} We reject that F belongs to K if T α,2 is positive. { T l 2 ε) q 2 l, u) } ) } > 0 α. 2.4. Testing convexity. We now consider the problem of testing that F belongs to K defined by 4). The testing procedure is based on the following property: if I, J and K are three subsets of {1, 2,..., n}, such that J is between I and K and if F K, then we find a linear combination of ȲY I, ȲY J, and ȲY K with non-positive expectation. Let x = x 1,..., x n ) and for each l {3,..., l n }, 1 i < j < k l, let λ l ijk = x J l k x J l j x J l k x J l i, and ) 1/2 Nijk l 1 = Jj l + λl ijk) 2 1 Ji l + 1 λl ijk) 2 1 Jk l.

TESTING CONVEX HYPOTHESES 7 For l {3,..., l n }, let T l 3 Y ) = max N ȲY ijk l J l j λ l ijkȳy J l i 1 λ l ijk )ȲY J l k 1 i<j<k l Y Π Vn,cste Y /, n l n and let q 3 l, u) be the 1 u quantile of the random variable T3 l ε). We introduce the test statistic 13) T α,3 = { max T l 3 Y ) q 3 l, u α ) }, l {3,...,l n} where u α is defined as { 14) u α = sup u ]0, 1[, P max l {3,...,l n} We reject that F belongs to K if T α,3 is positive. { T l 3 ε) q 3 l, u) } ) } > 0 α. 2.5. Properties of the procedures. In this section we evaluate the performances of the previous procedures under the null and under smooth alternatives. Proposition 1. Let T α, K) be either T α,1, K 0 ) or T α,2, K ) or T α,3, K ). We have sup sup P F,σ T α > 0) = α. σ>0 F K Assume now that x i = i/n for all i = 1,..., n and l n = [n/2]. Let us fix β ]0, 1[ and define for each s ]0, 1] and L > 0 σ ρ n = L 1/1+2s) 2 ) s/1+2s) logn). n Then, for n large enough there exists some constant κ depending on α, β, s only such that for all F H s L) satisfying 15) we have F ) = inf G K F G κρ n P F,σ T α > 0) 1 β. Comment. This result states that our procedures are of size α. Moreover, following the definition of the uniform separation rate of a test given in Section 1.3, this result shows that the tests achieve the uniform separation rate ρ n in sup-norm) over the Hölderian ball H s L). In the following proposition, we show that this rate cannot be improved at least in the Gaussian white noise model for testing positivity and monotonicity. The proof can be extended to the case of testing convexity but is omitted here.

8 Y. BARAUD, S. HUET, B. LAURENT Proposition 2. Let Y be the observation from the Gaussian white noise model 16) dy t) = F t)dt + 1 dw t), for t [0, 1], n where W is a standard Brownian motion. Let K be either the set K 0 or K and F some class of functions. For the distance.) to K given by 15), we define ρ n 0, F) = inf ρφ, F, ) where ρφ, F, ) is given by 6) and where the infimum is taken over all tests Φ of level 3α for testing F = 0. We define ρ n K, F) similarly by taking the infimum over all tests Φ of level α for testing F K. The following inequalities hold. i) If K = K 0 then ρ n K, F) ρ n 0, F). If K = K then for some constant κ depending on α and β only ρ n K, F) 1 [ρ n 0, F) κ σ ] n. 2 ii) In particular, if F = H s L), for n large enough there exists some constant κ depending on α, β and s only such that ) s/1+2s) logn) 17) ρ n K, F) κ L 1/1+2s). n The proof of the first part of the proposition extends easily to the regression framework. The second part ii), namely Inequality 17), derives from i) and the lower bound on ρ n 0, F) established by Ingster1993). For the problem of testing the positivity of a signal in the Gaussian white noise model, Juditsky and Nemirovski 2000) showed that the minimax separation rate with respect to the L r -distance r [1, + [) is of the same order as ρ n up to a logarithmic factor. 3. Testing that F satisfies a differential inequality. In this section, we consider the problem of testing that F belongs to K r,r defined by 5). Several applications of such hypotheses can be of interest. For example, by taking r = 1 and Rx) = expax) for some positive number a) one can test that a positive function F is decreasing at rate exp ax), that is, satisfies x [0, 1], 0 < F x) F 0) exp ax). Other kinds of decay are possible by suitably choosing the function R. Another application is to test that F belongs to the class of smooth functions { } F : [0, 1] R, F r) L. To tackle this problem, it is enough to test that the derivatives of order r of the functions F 1 x) = F x) + Lx r /r! and F 2 x) = F x) + Lx r /r! are positive. This is easily done by considering a multiple testing procedure based on the data Y i +

TESTING CONVEX HYPOTHESES 9 Lx r i /r! for testing that F 1 is positive, and on Y i + Lx r i /r! for testing that F 2 is positive. In Section 3.1, we consider the case where the function R equals 1. The procedure then amounts to testing that the derivative of order r of F is non-negative. We turn to the general case in Section 3.2. We first introduce the following notations. For w R n, we set R w the vector whose i-th coordinate R w) i equals Rx i )w i. For k N \ {0}, we denote by w k, the R n -vector w k 1,..., w k n), and we set w 0 = 1 by convention. For J {1,..., n}, let us define X J as the space spanned by 1 J, x J,, x r 1 J. 3.1. Testing that the derivative of order r of F is non-negative. In this section, we take Rx) = 1 for all x [0, 1]. The procedure lies on the idea that if the derivative of order r of F is non-negative then on each subset J of {1, 2,..., n}, the highest degree coefficient of the polynomial regression of degree r based on the pairs {x i, F x i )), i J} is non-negative. For example, under the assumption that F is non-decreasing, the slope of the regression based on the pairs {x i, F x i )), i J} is non-negative. Let l n = [n/2r + 1))], V n be the linear span of { 1 J, x J,, x r J, J J ln }, and for each J {1,..., n} t J = xr J Π X J x r J x r J Π X J x r J. For each l {1,..., l n }, let T l Y ) be defined as 18) T l < Y, t J > Y ) = max n dn J J l Y Π Vn Y and let ql, u) denote the 1 u quantile of the random variable T l ε). We introduce the following test statistic: 19) T α = max l {1,...,l n} { T l Y ) ql, u α ) }, where u α is defined as { 20) u α = sup u ]0, 1[, P max l {1,...,l n} We reject the null hypothesis if T α is positive. { T l ε) ql, u) } ) } > 0 α. Comment. When r = 1, the procedure is akin to that proposed by Hall and Heckman 2000) where for all l, ql, u α ) is the 1 α quantile of max l {1,...,ln} T l ε).

10 Y. BARAUD, S. HUET, B. LAURENT 3.2. Extension to the general case. The ideas underlying the preceding procedures extend to the case where R 1. In the general case, the test is obtained as follows. Let l n be such that the dimension d n of the linear space 21) V n = Span { 1 J, x J,, x r J, R 1 J,..., R x r J, J J ln} is not larger than n/2. We define for each J {1,..., n} 22) t J = R xr J Π X J x r J ) γ J where γ J = R x r J Π XJ x r J). We reject that F belongs to K r,r if T α defined by 19) is positive. 3.3. Properties of the tests. In this section, we describe the behavior of the procedure. We start with some notations. Let us define the function ΛF ) as ΛF )x) = dr [Rx)F x)], dxr and let ω be its modulus of continuity defined for all h > 0 by ωh) = sup ΛF )x) ΛF )y). x y h For J l n l=1 J l, let us denote by x J respectively x+ J ) the quantities min {x i, i J} respectively max {x i, i J}) and set h J = x + J x J. Let f = F x 1 ),..., F x n )) and for each l = 1,..., l n and β ]0, 1[, let ) ql, uα ) 23) ν l f, β) = χ 1 n n d dn n, f Π Vn f 2 /σ β/2) + Φ 1 β/2) σ. 2 For each ρ > 0, let { } E n,r ρ) = F : [0, 1] R, F r) H s L), inf F r) x) ρ. x [0,1] We have the following result. Proposition 3. For each β ]0, 1[, we have Let T α be the test statistic defined in Section 3.2. We have sup sup P F,σ T α > 0) = α. σ>0 F K r,r P F,σ T α > 0) 1 β, if for some l {1,..., l n } there exists a set J J l such that either 24) inf ΛF )x r!γ J i) ν l f, β) i J x r J Π X J x r + ωh J), J 2

TESTING CONVEX HYPOTHESES 11 or 25) r!γ J inf ΛF )x) ν l f, β) x ]x J,x+ J [ x r J Π X J x r. J 2 Moreover, if R 1 then there exists some constant κ depending on α, β, s and r only such that for n large enough and for all F E n,r ρ n,r ) with we have σ 2 ) s/1+2s+r)) logn) ρ n,r = κ L 1+2r)/1+2s+r)) n P F,σ T α > 0) 1 β. Comments. 1. In the particular case where R 1, let us give the orders of magnitude of the quantities appearing in the above proposition. Under the assumption that f Π Vn f 2 /n is smaller than σ 2, one can show that ν l is of order logn) see Section 9.2). When R 1, we have γ J = x r J Π X J x r J and it follows from computations that will be detailed in the proofs that 26) r!γ J ν l f, β) x r J Π X J x r C J 2 logn) nh 1+2r J for some constant C which does not depend on J nor n. 2. In the particular case where r = 1, Inequality 26) allows to compare our result to the performances of the test proposed by Ghosal et al. 2001). For each δ ]0, 1/3[, they give a procedure depending on δ) that is powerful if the function F is continuously differentiable and satisfies that for all x in some interval of length n δ, F x) < M logn)n 1 3δ)/2 for some M large enough. By using 25) and the upper bound in 26) with h J of order n δ, we deduce from Proposition 3 that our procedure is powerful too over this class of functions. Note that by considering a multiple testing procedure based on various scales l, our test does not depend on δ and is therefore powerful for all δ simultaneously. 3. For r = 1 respectively r = 2) and s = 1, Dümbgen and Spokoiny 2001) obtained the uniform separation rate ρ n,r for testing monotonicity respectively convexity) in the Gaussian white noise model. 4. For the problem of testing monotonicity r = 1 and R 1), it is possible to combine this procedure with that proposed in Section 2.3. More precisely, consider the test which rejects the null at level 2α if one of these two tests rejects. The so-defined test performs as well as the best of these two tests under the alternative.

12 Y. BARAUD, S. HUET, B. LAURENT 4. A general approach. The problems we have considered previously reduce to testing that f = F x 1 ),..., F x n )) belongs to a convex set of the form 27) C = {f R n, j {1,..., p} < f, v j > 0}, where the vectors {v 1,..., v p } are linearly independent in R n. For example, testing that the regression function F is non-negative or non-decreasing amounts to testing that the mean of Y belongs respectively to the convex subsets of R n 28) C 0 = {f R n, i {1,..., n} f i 0}. and 29) C = {f R n, i {1,..., n 1} f i+1 f i 0}. Clearly, these sets are of the form given by 27) by taking respectively p = n, v j = e j and p = n 1, v j = e j e j+1. The following proposition extends this result to the general case. Note that one can also define the set C as C = {f R n, L 1 f) 0,..., L p f) 0}, where the L i s are p independent linear forms. We shall use this definition of C in the following. Proposition 4. For each r {1,..., n 1} and i {1,..., n r} let φ i,r be the linear form defined for w R n by 1 x i x r 1 i w i 1 x i+1 x r 1 i+1 φ i,r w) = det w i+1...... 1 x i+r x r 1 i+r w i+r If F belongs to K then f = F x 1 ),..., F x n )) belongs to 30) C = {f R n, i {1,..., n 2}, φ i,2 f) 0}. If F belongs to K r,r then f belongs to C r,r = {f R n, i {1,..., n r}, φ i,r R f) 0}. In view of keeping our notations as simple as possible, we omit the dependency of the linear forms φ i,r with respect to r when there is no ambiguity. The remaining part of the section is organized as follows. In the next subsection, we present a general approach for the problem of testing that f belongs to C. In the last subsection, we show how this approach applies to the problems of hypothesis testing considered in Sections 2 and 3.

TESTING CONVEX HYPOTHESES 13 4.1. Testing that f belongs to C. We consider the problem of testing that the vector f = f 1,..., f n ) involved in the regression model 31) Y i = f i + σε i, i = 1,..., n. belongs to C defined by 27). Our aim is twofold. First, build a test which achieves its nominal level, and second, describe for each n a class of vectors over which this test is powerful. The testing procedure. The testing procedure relies on the following idea: since under the assumption that f belongs to C, the quantities < f, p j=1 λ jv j > are non-positive for all non-negative numbers λ 1,, λ p, we base our test statistic on random variables of the form < Y, p j=1 λ jv j > for non-negative sequences of λ j s. We denote by T the subset of R n defined by 32) T = t = p λ j v j, t = 1, λ j 0, j = 1,..., p. j=1 Let T n be a finite subset of T such that there exists some linear space V n, with dimension d n < n containing the linear span of T n. Let {q t α), t T n } be a sequence of numbers satisfying 33) P [ sup t T n We reject the null hypothesis if the statistic 34) n ) ] < ε, t > dn ε Π Vn ε q tα) > 0 = α. T α = sup t T n n dn < Y, t > Y Π Vn Y q tα) is positive. Properties of the test. For all β ]0, 1[ and each t T n let ) 1 35) v t f, β) = q t α) χ 1 n n d dn n, f Π Vn f 2 /σ β/2) + Φ 1 β/2) σ. 2 The order of magnitude of v t f, β) is proved to be logn)σ under the assumption that f Π Vn f 2 /n is smaller than σ 2 as it is shown in the proof of Proposition 1. We have the following result. 36) Theorem 1. Let T α be the test statistic defined by 34). We have sup sup P f,σ T α > 0) = P 0,1 T α > 0) = α. σ>0 f C Moreover, if there exists t T n such that < f, t > v t f, β) then P f,σ T α > 0) 1 β. Comments. The values of the q t α) s that satisfy 33) can be easily obtained by simulations under P 0,1. This property of our procedure lies in the fact that the least-favorable distribution under the null is P 0,1. Note that we do not need to use bootstrap procedures to implement the test. )

14 Y. BARAUD, S. HUET, B. LAURENT 4.2. How to apply these procedures to test qualitative hypotheses? In the sequel, we give the choices of T n and V n leading to the tests presented in Sections 2 and 3. For the test of positivity described in Section 2.2. We take T n = T n,1, with T n,1 = ln l=1 T n,1, l where for all l {1,..., l n } Tn,1 l = 1 e j, J J l J. We take V n = V n,cste. Note that V n,cste is also the linear span of T n,1. j J For the test of monotonicity described in Section 2.3. {2,..., l n } and 1 i < j l, e l ij = Nij l 1 Ji l e l 1 37) Jj l l J l i l J l j Let us define for each l Note that Nij l is such that el ij = 1. We take T n = T n,2, with T n,2 = l n where T l n,2 = { e l ij, 1 i < j l }, and we take V n = V n,cste. Note that V n contains T n,2. e l. l=2 T n,2, l For the test of convexity presented in Section 2.4. Let us define for each l {3,..., l n }, 1 i < j < k l, 38) e l ijk = Nijk l 1 Jj l e l λ l 1 ijk Ji l e l 1 λ l ijk) 1 Jk l e l. l J l j Note that Nijk l is such that el ijk = 1. We take T n = T n,3, with T n,3 = l n where l J l i T l n,3 = { e l ijk, 1 i < j < k l }, and we take V n = V n,cste. Note that V n contains T n,3. For the test of F K r,r presented in Section 3. T n,4 = l n l=1 T l n We take where T l n = { t J, J J l} and V n = V n,4 defined by 21). Note that V n contains T n,4. l J l k l=3 T n,3, l We justify these choices of T n by the following proposition proved in Section 7. Proposition 5. Let C and T n be either C 0, T n,1 ), C, T n,2 ), C, T n,3 ) or C r,r, T n,4 ). There exist v 1,..., v p for which C is of the form given by 27) and for which T, defined by 32), contains T n.

TESTING CONVEX HYPOTHESES 15 5. Simulation studies. In this section we describe how to implement the test for testing F K and we carry out a simulation study in order to evaluate the performances of our tests both when the errors are Gaussian and when they are not. We first describe how the testing procedure is performed, then we present the simulation experiment and finally discuss the results of the simulation study. 5.1. The testing procedures. We carry out the simulation study for the two testing procedures described in Sections 2.3 and 3.1. In the sequel, the procedure based on differences of local means and described in Section 2.3 is called LM and the procedure based on local gradients defined below from the test statistic given in Section 3.1 with r = 1) is called LG. In the case of the procedure LM, we set T LM = T α,2 defined in 11). For each l, the quantiles q 2 l, u α ) are calculated as follows. For u varying among a suitable grid of values u 1,..., u m, we estimate by simulations the quantity { pu j ) = P max T l 2 ε) q 2 l, u j ) } ) > 0, l=1,...,l n ε being a n-sample of N 0, 1), and we take u α as max {u j, pu j ) α}. Note that u α does not depend on x i, i = 1,..., n), but only on the number of observations n. In the case of the procedure LG, the test statistic is defined as follows. For each l = 1,..., l n and for J J l, we take t J = x J1 J x J x J 1 J x J. The space V n reduces to V n,lin the linear space of dimension 2l n generated by { 1J, x J, J J ln}. The test statistic T α takes the form T LG = T α,4 = where for each l {1,..., l n }, { max T l 4 Y ) q 4 l, u α ) }, l=1,...,l n T l 4 Y ) = max J J l n 2ln < Y, t J > Y Π Vn,lin Y, and q 4 l, u α ) denotes the 1 u quantile of the random variable T l 4 ε). The procedure for calculating q 4 l, u α ) for l = 2,..., l n is the same as the procedure for calculating the q 2 l, u α ) s. 5.2. The simulation experiment. The number of observations n equals 100, x i = i/n + 1), for i = 1,..., n and l n is either equal to 15 or 25. We consider three distributions of the errors ε i, with expectation zero and variance 1. 1. The Gaussian distribution: ε i N 0, 1).

16 Y. BARAUD, S. HUET, B. LAURENT 2. The type I distribution: ε i has density sf X µ + sx) where f X x) = exp{ x exp x)} and where µ and s 2 are the expectation and the variance of a variable X with density f X. This distribution is asymmetrical. 3. The mixture of Gaussian distributions: ε i is distributed as πx 1 + 1 π)x 2 where π is distributed as a Bernoulli variable with expectation 0.9, X 1 and X 2 are centered Gaussian variables with variance respectively equal to 2.43s and 25s, π, X 1 and X 2 are independent. The quantity s is chosen such that the variance of ε i equals 1. This distribution has heavy tails. We consider several functions F that are presented below. For each of them, we simulate the observations Y i = F x i ) + σε i. The values of σ 2 and of the distance in sup-norm between F and K are reported in Table 1: d F, K ) = 1 2 sup F s) F t)). 0 s t 1 Let us comment the choice of the considered functions. F 0 x) = 0 corresponds to the case for which the quantiles ql, u α ) are calculated. The function F 1 x) = 151 x 0.5 x 0.5) 3 +0.3x 0.5) exp 250x 0.25) 2 ) presents a strongly increasing part with a pronounced dip around x = 1/4 followed by a nearly flat part on the interval [1/2, 1]. The decreasing linear function F 2 x) = ax, the parameter a being chosen such that a = 1.5σ. The function F 3 x) = 0.2 exp 50x 0.5) 2 ) deviates from F 0 by a smooth dip while the function F 4 x) = 0.1 cos6πx) deviates from F 0 by a cosine function. The functions F 5 x) = 0.2x + F 3 x) and F 6 x) = 0.2x + F 4 x) deviate from an increasing linear function in the same way as F 3 and F 4 do from F 0. Let us mention that it is more difficult to detect that F 5 respectively F 6 ) is nonincreasing than to detect that F 3 respectively F 4 ) is. Indeed, adding an increasing function to a function F reduces the distance in sup-norm between F and K. This is the reason why the values of σ are smaller in the simulation study when we consider the functions F 5 and F 6. In Figure 1 we have displayed the functions F l for l = 1,..., 6 and for each of them one sample simulated with Gaussian errors. The corresponding values of the test statistics T LM and T LG for α = 5% and l n = 25 are given. For this simulated sample, it appears that the test based on the statistic T LM leads to reject the null hypothesis in all cases, while the test based on T LG reject in all cases except for functions F 2 and F 4. The results of the simulation experiment based on 4000 simulations are presented in Tables 2 and 3.

TESTING CONVEX HYPOTHESES 17 F σ 2 d F, K ) F 0 x) 0.01 0 F 1 x) 0.01 0.25 F 2 x) 0.01 0.073 F 3 x) 0.01 0.1 F 4 x) 0.01 0.1 F 5 x) 0.004 0.06 F 6 x) 0.006 0.08 Table 1 Testing monotonicity : simulated functions F, values of σ 2 and distance in sup-norm between F and K. Fig. 1. For each function F l, l = 1,..., 6, the simulated data Y i = F l x i ) + σε i for i = 1,... n are displayed. The errors ε i are Gaussian normalized centered variables. The value of the test statistics T LM and T LG, with α = 5%, are given for each example. l n = 15 l n = 25 Errors Distribution T LM T LG T LM T LG Gaussian 0.049 0.050 0.046 0.051 Type I 0.048 0.072 0.064 0.085 Mixture 0.064 0.117 0.093 0.180 Table 2 Testing monotonicity : levels of the tests based on T LM and T LG. l n = 15 l n = 25 F T LM T LG T LM T LG F 1 0.85 0.99 0.99 1. F 2 0.96 0.96 0.99 0.99 F 3 0.99 0.73 1 0.98 F 4 0.89 0.71 0.99 0.94 F 5 0.99 0.69 0.99 0.87 F 6 0.87 0.79 0.98 0.93 Table 3 Testing monotonicity : powers of the tests based on T LM and T LG when the errors are Gaussian.

18 Y. BARAUD, S. HUET, B. LAURENT 5.3. Comments on the simulation study. As expected, the estimated level of the test calculated for the function F 0 x) = 0 is nearly) equal to α when the errors are distributed as Gaussian variables. When l n = 25, the estimated levels of the tests for the Mixture and Type I errors distributions are greater than α see Table 2). Let us recall that when l n is large, we are considering statistics based on the average of the observations on sets J with small cardinality. Therefore, reducing l n improves the robustness to non-gaussian errors distribution. This is what we get in Table 2 for l n = 15. It also appears that the method based on the local means is more robust than the method based on the local gradients, and that both methods are more robust for the Type I distribution that is asymmetric but not heavy tailed, than for the Mixture distribution. Except for the function F 1, the estimated power is greater for the procedure based on the local means than for the procedure based on the local gradients see Table 3). For both procedures the power of the test is larger with l n = 25 than with l n = 15. However, except for the function F 1, the loss of power is less significant for the procedure based on the local means. 5.4. Comparison with other works. As expected, the power of our procedure T LG for the function F 1 is similar to that obtained by Hall and Heckman 2000). The decreasing linear function F 2 x) = ax has already been studied by Gijbels et al. 2000) with a = 3σ. They get an estimated power of 77%. Gijbels et al. 2000) studied the function 0.075F 3 /0.2 with σ = 0.025 and obtained a simulated power of 98%. With the same function and the same σ, we get a power equal to one, for both procedures and for l n = 15 and l n = 25. Gijbels et al. 2000) and Hall and Heckman 2000) calculated the power of their test for the function F 7 x) = 1 + x a exp 50x 0.5) 2 ) for different values of a and σ. When a = 0.45 and σ = 0.05, we get a power equal to 1 as Gijbels et al do. When a = 0.45 and σ = 0.1, we get a power equal to 76% when using the procedure T LM with l n = 25 or l n = 15. Gijbels et al. 2000) got 80% and Hall and Heckman 2000) a power larger than 87%. 6. Proof of Theorem 1 Level of the test: we first prove that for all t T n, q t α) > 0. Indeed, thanks to 33), we have [ n ] < ε, t > P dn ε Π Vn ε q tα) > 0 [ n ) ] < ε, t > P sup dn t T n ε Π Vn ε q tα) > 0 α < 1/2. Since the random variable n d n < ε, t > / ε Π Vn ε is symmetric distributed as a Student with n d n degrees of freedom) we deduce that q t α) is positive. In

the sequel let us set TESTING CONVEX HYPOTHESES 19 ˆσ n = Y Π Vn Y / n d n. Since for all f C and j {1,..., p}, < f, v j > 0 we have that for all t T n, p λ j < f, v j > < f, t >= p j=1 λ jv j 0. j=1 Hence, < Y, t >=< f, t > +σ < ε, t > σ < ε, t > and therefore for all f C and σ > 0, [ ) ] < ε, t > P f,σ [T α > 0] P f,σ sup ˆσ n /σ q tα) > 0 P f,σ t T n [ ˆσn σ < sup < ε, t > t T n q t α) We now use the following lemma for non-central χ 2 -random variables: Lemma 1. For all u > 0, f R n and σ > 0 P f,σ [ˆσ n < σu] P 0,1 [ˆσ n < u]. This lemma states that a non-central χ 2 -random variable is stochastically larger than a χ 2 -random variable with the same degrees of freedom. For a proof we refer to Lemma 1 in Baraud, Huet and Laurent 2003). Since T n V n, the random variables < ε, t > for t T n are independent of ˆσ n and thus by conditioning with respect to the < ε, t > s and using Lemma 1 we get sup sup σ>0 f C ]. < ε, t > P f,σ [T α > 0] P 0,1 [ˆσ n < sup t T n q t α) = P 0,1 [T α > 0] = α. The reverse inequality being obvious, this concludes the proof of 36). Power of the test: For any f R n and σ > 0 Setting we have P f,σ T α 0) = P f,σ t T n, < Y, t > q t α)ˆσ n ). x n f, β) = It follows that for all f R n and σ > 0, σ n dn χ 1 n d n, f Π Vn f 2 /σ 2 β/2), P f,σ ˆσ n > x n f, β)) = β/2. P f,σ T α 0) inf t T n P f,σ < Y, t > q t α)x n f, β)) + β/2 inf t T n P f,σ σ < ε, t > q t α)x n f, β) < f, t >) + β/2. ]

20 Y. BARAUD, S. HUET, B. LAURENT Since t = 1, < ε, t > is distributed as a standard Gaussian variable, and therefore P f,σ T α 0) β as soon as there exists t T n such that This concludes the proof of Theorem 1. q t α)x n f, β) < f, t > σ Φ 1 β/2). 7. Proof of Propositions 4 and 5 Let us denote by I r the set of increasing sequences of r + 1 indices in {1,..., n} that is 39) I r = {i 1,..., i r+1 ), i 1 < < i r+1, i j {1,..., n}}. For i = i 1,..., i r+1 ) I r and v R n we set 40) φ i v) = det. 1 x i1 x r 1 i 1 1 x i2 x r 1 i 2... 1 x ir+1 x r 1 i r+1 v i1 v i2. v ir+1 For i = i,..., i + r) φ i v) = φ i v) where φ i v) is defined by Equation 30). For w 1,..., w q q vectors of R n, we set Let us define. Gramw 1,..., w q ) = det G) where G = < w i, w j >) 1 i,j q. 41) C r,r = {f R n, i I r, φ i R f) 0}. The proofs of Propositions 4 and 5 rely on the following lemma. 42) Lemma 2. The following equalities hold. First, C r,r = C r,r. Assume that f = F x 1 ),..., F x n )) where F is such that RF is r-th times differentiable. Then for each i I r there exists some c i ]x i1, x ir+1 [ such that 43) φ i R f) = ΛF )c i) φ i x r ). r! For J {1,..., n} let t J be defined by 22). We have 44) < f, t J >= N 1 J φ i R f)φ i x r ). i I r J r+1 where N J = Gram1 J, x J,..., x r 1 J )γ J. The proof of the lemma is delayed to the Appendix.

TESTING CONVEX HYPOTHESES 21 7.1. Proof of Proposition 4. The result concerning K is clear as a function F is non-concave on [0, 1] if and only if for all x, y, z in [0, 1] with x < y < z one has det 1 x F x) 1 y F y) 0. 1 z F z) Let us now turn to the set K r,r. First note that the n r linear forms f φ i,r R f) are independent since the linear space which is generated by {f R n, i {1,..., n r} φ i,r R f) = 0}, 1 R 1, 1 R x,..., 1 R xr 1, is of dimension r. Second, the fact that f belongs to C r,r is a straightforward consequence of 43) since under the assumption that F K r,r, ΛF )x) 0 for all x, and since the Vandermond determinants φ i x r ) are positive for all i I r. 7.2. Proof of Proposition 5. The result is clear in the case where C 0. For the other cases we use the following lemma. Lemma 3. Let W be the orthogonal of the linear space generated by the v j s for j = 1,..., p. If t W satisfies for all f C then < t Π W t, f > 0 t Π W t t Π W t T. Proof. The vector t Π W t belongs to the linear space generated by the v j s and thus one can write g = t Π W t = p j=1 λ jv j. It remains to show that the λ j s are non-negative. Let us fix j 0 {1,..., p} and choose f j0 in R n satisfying < f j0, v j >= 0 for all j j 0 and < f j0, v j0 >< 0. Such a vector exists since the v j s are linearly independent in R n. Clearly f j0 belongs to C and therefore < f j0, g >= λ j0 < f j0, v j0 > 0 which constraints λ j0 to be non-negative. This concludes the proof of Lemma 3. Let us consider the case where C = C. We apply Lemma 3. In this case, W is the linear space generated by 1, we get that for all l {2,..., l n } and 1 i < j l, e l ij satisfies Π W e l ij = 0. Moreover el ij = 1 and f C, < f, e l ij >= Nij l fj f ) l i J l j 0. Let us consider the case where C = C. In this case, p = n 2 and for all j = 1,..., n 2, v j = x j+1 x j+2 )e j + x j+2 x j )e j+1 + x j x j+1 )e j+2.

22 Y. BARAUD, S. HUET, B. LAURENT Since e l ijk = 1, by Lemma 3 it is enough to prove that i) for all f W, < f, e l ijk >= 0, ii) for all f C, < f, e l ijk > 0. First note that for all f R n, 45) < f, e l ijk >= N l ijk fj l j λl ijk f J l i 1 λ l ijk) f J l k ). Clearly if f = 1 or f = x, < f, e l ijk >= 0 and since by definition of C, W is the linear space generated by 1 and x, i) holds true. Let now f C. There exists some convex function F mapping [x 1, x n ] into R such that F x i ) = f i for all i = 1,..., n take the piecewise linear function verifying this property for instance). Let i < j < k and l Jj l. We set Note that 0 µ l ik 1 and that µ l ik = x J l k x l x J l k x J l i. x l = µ l ik x J l i + 1 µ l ik) x J l k. Since F is convex on [x 1, x n ] we have for all l J l j, Note that l J l j F x l ) µ l ikf x J l i ) + 1 µ l ik)f x J l k ) µ l ik f J l i + 1 µ l ik) f J l k. µ l ik / J j l = λl ijk. We derive from the above inequality that f J l j = 1 J l j l J l j which, thanks to 45), leads to ii). F x l ) λ l ijk f J l i + 1 λ l ijk) f J l k, Let us consider the case where C = C r,r. By Lemma 2 we know that C r,r = C r,r and therefore for each i I r, the linear form f φ i R f) is a linear combination of the linear forms f φ i R f) with i = 1,..., n r. Consequently, if w W then for all i I r, φ i R w) = 0. For each J {1,..., n}, t J defined by 22) satisfies t J = 1. By applying 44) with f = w, we get < w, t J > 0 for all w C r,r and < w, t J >= 0 for all w W. Consequently, by Lemma 3, t J belongs to T. 8. Proof of Proposition 1 8.1. Proof for T α, C) = T α,1, C 0 ). We prove the proposition by applying Theorem 1. We decompose the proof into six steps.

TESTING CONVEX HYPOTHESES 23 T 1 Step 1 For all integer N 1, let N u) denote the 1 u quantile of a Student random variable with N degrees of freedom. We have for all u ]0, 1[, ) ) ))} 1 1 2 1 46) T 1 N {log u) 1 + C 1/4 + log 1/2 exp u u N log u for some absolute constant C > 0. F 1 Proof of step 1 Let 1,N u) denote the 1 u quantile of a Fisher variable with 1 and N degrees of freedom, then T 1 N u) = F 1 1,N u). It follows from Lemma 1 in Baraud, Huet and Laurent 2003) that for all u ]0, 1[, N 1, F 1 1,N u) 1 + 2 2 log 1/2 1 u ) + 3N { )) } 4 1 exp 2 N log 1. u Using the inequality expx) 1 x expx) which holds for all x > 0, we obtain ) ) )) 1 1 4 1 F 1 1,N u) 1 + 2 2 log 1/2 + 6 log exp u u N log u and since a + b a + b for all a > 0 and b > 0 F 1 1 1,N {log u) 1 + C 1/4 u for some absolute constant C > 0. ) + log 1/2 1 u ) 2 exp N log ))} 1 u Step 2 47) For all l {1,..., l n }, t T l n,1, we have q t α) = q 1 l, u α ) Cα) logn). Proof of step 2 On the one hand, by definition of q 1 l,.) and thus 48) α = P 0,1 T α,1 > 0) l n l=1 P T l 1 ε) q 1 l, u α ) > 0 ) l n u α, u α α/l n. On the other hand, for all l {1,..., l n } and J J l, the random variables U J = i J ε i n d n ε Π Vn ε J being distributed as Student variables with n d n degrees of freedom, we have that )) P T1 l 1 uα ε) > T n d n J l )) 1 uα 49) U J > T n d n J l u α J J l P

24 Y. BARAUD, S. HUET, B. LAURENT 1 and thus q 1 l, u α ) T n d uα n / J l ). This inequality together with 48) and 46) leads to 47), as J l l n n/2 and n d n = n l n n/2. Step 3 For all f = F x 1 ),..., F x n )) with F H s L), 50) Proof of step 3 f Π Vn,cste f 2 n Note that the vector Cs)L 2 n 2s. belongs to V n,cste and therefore f = l n k=1 F x Jk )1 Jk f Π Vn,cste f 2 f f 2 = l n k=1 l n k=1 Noting that l n = d n n/4, we get 50). i J k F x i ) F x Jk )) 2 L 2 ln 2s i J k = nl 2 l 2s n. Step 4 Assuming that n L/σ) 1/s, there exists some constant C depending on s and β only such that 51) χ 1 n d n, f Π Vn,cste f 2 /σ 2 β/2) n d n C. Using the inequality due to Birgé 2001) on the quantiles of non-central χ 2 we have that χ 1 n d n,a 2 β/2) n d n + a 2 + 2 n d n + 2a 2 ) log2/β) + 2 log2/β). Setting a = f Π Vn,cste f /σ and using 50) we derive that 52) χ 1 n d n,a 2 β/2)/n d n ) Cβ, s). Step 5 Under the assumption of step 4, for all t T n, v t f, β) κ logn)σ, for some constant κ depending on α, β, s only.

TESTING CONVEX HYPOTHESES 25 Proof of step 5 We recall that ) 1 v t f, β) = q t α) χ 1 n n d dn n, f Π Vn f 2 /σ β/2) + Φ 1 β/2) σ. 2 We conclude by using the elementary inequality Φ 1 β/2) 2 log2/β), by gathering 47) and 51). We conclude the proof with this final step. Step 6 There exists a constant κ depending on α, β and s only, such that if n large enough, and F satisfies 53) then, there exists t T n such that 54) min F x) κρ n, x [0,1] < f, t > v t f, β). Proof of step 6 Since F H s L), under Assumption 53), there exists j {1, 2,..., n} such that F j/n) κρ n + Ln s. For n large enough, Ln s κρ n /2, hence F j/n) κρ n /2. Let us take κ satisfying κ 4 = 2κ ) 2s/1+2s) where κ is defined at step 5. Let us define [ ) ] 1/s 4L 55) ln) =, κρ n and J as the element of J ln) containing j. Note that for n large enough, ln) {1,..., l n }. Now, for all k J, since F H s L) and thus, by taking t T n,1 as f k = F x k ) = F x j ) + F x j ) + F x k ) κρ n /2 + L x k x j s κρ n /2 + Lln) s κρ n /4 t = 1 e i, J i J

26 Y. BARAUD, S. HUET, B. LAURENT we derive that < f, t > = J f J J κρ n /4. By construction of the partition of the data, we have for all positive integers p q r that [ ] [ ] r r 56) I r q p,q + 1. q For all j {1,..., ln)}, J = J ln) j disjoint sets of cardinality at least [n/l n ]. Hence J ln) j [ n l n see 8)) is a union of I ln j,ln) [l n/ln)] ] [ ln ln) since [x] x/2 for all x 1. Therefore we get 57) using 55). This implies that n < f, t > 8 by definition of κ. J n/4ln)) n 4 ] κ ρ n 4L n 4ln) ) 1/s κρn ) 1/2s) κρn κ σ logn) 4L 8.2. Proof for T α, C) = T α,2, C ). We follow the proof of Theorem 1 for T α, C) = T α,1, C 0 ) : the results of steps 1 to 5 still hold. The proof of step 2 differs in the following way: Equation 49) becomes P T l 2 ε) > 1 i<j l P T 1 n d n u α T l n,2 )) < ε, e l ij > ε Π Vn,cste ε /n l n ) > T 1 n d n u α T l n,2 )) u α. We conclude the proof of step 2 by noticing that for all l {1,..., l n }, T l n,2 is bounded from above by n 2 /4. Step 6 58) For n large enough, under the assumption that inf F G κρ n, G K there exists t T n,2, such that < t, f > v t f, β).

TESTING CONVEX HYPOTHESES 27 Proof of step 6 Let us first remark that Indeed, let G K be defined as inf F G sup F x) F y)). G K 0 x y 1 Then, G y) = sup F x). 0 x y inf F G F G = sup F x) F y)). G K 0 x y 1 Hence, under Assumption 58), there exists x < y such that F x) F y) κρ n. Since F H s L), if x i x 1/n and x j y 1/n, then F x i ) F x j ) κρ n 2Ln s κρ n /2 for n large enough. Hence, there exists 1 i < j n such that F x i ) F x j ) κρ n /2. Let us set [ ) ] 1/s 8L ln) =, κρ n which belongs to {1,..., l n } at least for n large enough. Let I and J be the elements of J ln) satisfying i I and j J. Arguing as in step 6 of Section 8.1, since F H s L), and we deduce that f I F x i ) Lln) s and f J F x j ) + Lln) s f I f J κρ n /2 2Lln) s κρ n /4. This implies that there exists 1 i < j ln) with I = J ln) i such that < e ln) ln) i j, f >= Ni j fi f ) ln) κρ n J N i j 4. ln) and J = Jj, Using Inequality 56), and since l n = [n/2], we have that for all K J ln), [ ] [ ] ) ln ln 2 K 3 + 1, ln) ln) which implies that N ln) i j = I J I + J C l n ln). We now conclude as in the proof of step 6 by taking t = e ln) i j.