Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression

Size: px
Start display at page:

Download "Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression"

Transcription

1 Electronic Journal of Statistics Vol ISSN: DOI: 10114/08-EJS67 Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression Xavier Gendre Laboratoire J-A Dieudonné, Université de Nice - Sophia Antipolis Parc Valrose Nice Cedex 0 France gendre@unicefr url: gendre Abstract: Let Y be a Gaussian vector of R n of mean s and diagonal covariance matrix Γ Our aim is to estimate both s and the entries σ i = Γ i,i, for i = 1,, n, on the basis of the observation of two independent copies of Y Our approach is free of any prior assumption on s but requires that we know some upper bound γ on the ratio max i σ i /min i σ i For example, the choice γ = 1 corresponds to the homoscedastic case where the components of Y are assumed to have common unknown variance In the opposite, the choice γ > 1 corresponds to the heteroscedastic case where the variances of the components of Y are allowed to vary within some range Our estimation strategy is based on model selection We consider a family {S m Σ m, m M} of parameter sets where S m and Σ m are linear spaces To each m M, we associate a pair of estimators ŝ m, ˆσ m of s, σ with values in S m Σ m Then we design a model selection procedure in view of selecting some ˆm among M in such a way that the Kullback risk of ŝ ˆm, ˆσ ˆm is as close as possible to the minimum of the Kullback risks among the family of estimators {ŝ m, ˆσ m, m M} Then we derive uniform rates of convergence for the estimator ŝ ˆm, ˆσ ˆm over Hölderian balls Finally, we carry out a simulation study in order to illustrate the performances of our estimators in practice AMS 000 subject classifications: 6G08 Keywords and phrases: Gaussian regression, heteroscedasticity, model selection, Kullback risk, convergence rate Received July Introduction Let us consider the statistical framework given by the distribution of a Gaussian vector Y with mean s = s 1,, s n R n and diagonal covariance matrix σ Γ σ = σ n 1345

2 X Gendre/Simultaneous estimation of the mean and the variance 1346 where σ = σ 1,, σ n 0, n The vectors s and σ are both assumed to be unknown Hereafter, for any t = t 1,, t n R n and τ = τ 1,, τ n 0, n, we denote by P t,τ the distribution of a Gaussian vector with mean t and covariance matrix Γ τ and by KP s,σ, P t,τ the Kullback-Leibler divergence between P s,σ and P t,τ, KP s,σ, P t,τ = 1 n s i t i + φ where φu = log u + 1/u 1, for u > 0 Note that, if the σ i s are known and constant, the Kullback-Leibler divergence becomes the squared L -norm and, in expectation, corresponds to the quadratic risk Let us suppose that we observe two independent copies of Y, namely Y [1] = Y [1] 1 as [1],, Y n and Y [] = Y [] [],, Y n Their coordinates can be expanded Y [j] i 1 τ i τi = s i + σ i ε [j] i, i = 1,, n and j = 1,, 11 where ε [1] = ε [1] 1,, ε[1] n and ε [] = ε [] 1,, ε[] n are two independent standard Gaussian vectors We are interested here in the estimation of the two vectors s and σ Indeed, their behaviors contain substantial knowledge about the phenomenon represented by the distribution of Y We have particularly in mind the case of a variance that stays approximately constant by periods and that can take several values in the proceeding of the observations Of course, we want to estimate the mean s but, in this particular case, we are also interested in recovering the periods of constancy and the values taken by the variance σ The Kullback-Leibler divergence measures the differences between two distributions P s,σ and P t,τ Thus, it allows us to deal with the two estimation problems at the same time More generally, the aim of this paper is to estimate the pair s, σ by model selection on the basis of the observation of Y [1] and Y [] For this, we introduce a collection F = {S m Σ m, m M} of products of linear subspaces of R n indexed by a finite or countable set M In the sequel, these products will be called models and, for any m M, we will denote by D m the dimension of S m Σ m To each m M, we will associate a pair of estimators ŝ m, ˆσ m that is similar to the maximum likelihood estimator MLE It is well known that, if the σ i s are equal, the estimators of the mean and the variance factor given by maximization of the likelihood are independent This fact does not remain true if the σ i s are not constant To recover the independence between the estimators of the mean and the variance, we construct them separately from the two independent copies Y [1] and Y [] For the estimator ŝ m of s, we take the MLE based on Y [1] and for the estimator ˆσ m of σ, we take the MLE based on Y [] Thus, for each m M, we have a pair of independent estimators ŝ m, ˆσ m = ŝ m Y [1], ˆσ m Y [] with values in S m Σ m The Kullback risk of ŝ m, ˆσ m is given by E[KP s,σ, Pŝm,ˆσ m ] and is of order of the sum of two terms, inf t,τ S m Σ m KP s,σ, P t,τ + D m 1 σ i,

3 X Gendre/Simultaneous estimation of the mean and the variance 1347 The first one, called the bias term, represents the capacity of S m Σ m to approximate the true value of s, σ The second, called the variance term, is proportional to the dimension of the model and corresponds to the amount of noise that we have to control To warrant a small risk, these two terms have to be small simultaneously Indeed, using the Kullback risk as a quality criterion, a good model is one minimizing 1 among F Clearly, the choice of a such model depends on the pair of the unknown parameters s, σ and make good models unavailable to us So, we have to construct a procedure to select an index ˆm = ˆmY [1], Y [] M depending on the data only, such that E[KP s,σ, Pŝ ˆm,ˆσ ˆm ] is close to the smaller risk Rs, σ, F = inf m M E[KP s,σ, Pŝm,ˆσ m ] The art of model selection is precisely to provide procedure solely based on the observations in that way The classical way consists in minimizing an empirical penalized criterion stochastically close to the risk Considering the likelihood function with respect to Y [1], t R n, τ 0, n, Lt, τ = 1 n Y [1] i t i τ i + log τ i, we choose ˆm as the minimizer over M of the penalized likelihood criterion Critm = Lŝ m, ˆσ m + penm 13 where pen is a penalty function mapping M into R + = [0, In this work, we give a form for the penalty in such a way to obtain a pair of estimators ŝˆm, ˆσ ˆm with a Kullback risk close to Rs, σ, F Our approach is free of any prior assumption on s but requires that we know some upper bound γ 1 on the ratio σ /σ γ where σ resp σ is the maximum resp minimum of the σ i s The knowledge of γ allows us to deal equivalently with two different cases First, γ = 1 corresponds to the homoscedastic case where the components of Y [1] and Y [] are independent with a common variance ie σ i σ which can be unknown On the other side, γ > 1 means that the σ i s can be distinct and are allowed to vary within some range This uncommonness of the variances of the observations is known as the heteroscedastic case Heteroscedasticity arises in many practical situations in which the assumption that the variances of the data are equal is debatable The research field of the model selection has known an important development in the last decades and it is beyond the scope of this paper to make an exhaustive historical review of the domain The interested reader could find a good introduction to model selection in the first chapters of [17] The first heuristics in the domain are due to Mallows [16] for the estimation of the mean

4 X Gendre/Simultaneous estimation of the mean and the variance 1348 in homoscedastic Gaussian regression with known variance In more general Gaussian framework with common known variance, Barron et al [7], Birgé and Massart [9] and [10] have designed an adaptive model selection procedure to estimate the mean for quadratic risk They provide non-asymptotic upper bound for the risk of the selected estimator For bound of order of the smaller risk among the collection of models, this kind of result is called oracle inequalities Baraud [5] has generalized their results to homoscedastic statistical models with non-gaussian noise admitting moment of order larger than and a known variance All these results remain true for common unknown variance if some upper bound on it is supposed to be known Of course, the bigger is this bound, the worst are the results Assuming that γ is known does not imply the knowledge of a such upper bound In the homoscedastic Gaussian framework with unknown variance, Akaike has proposed penalties for estimating the mean for quadratic risk see [1, ] and [3] Replacing the variance by a particular estimator in his penalty term, Baraud [5] has obtained oracle inequalities for more general noise than Gaussian and polynomial collection of models Recently, Baraud, Giraud and Huet [6] have constructed penalties able to take into account the complexity of the collection of models for estimating the mean with quadratic risk in Gaussian homoscedastic model with unknown variance They have also proved results for the estimation of the mean and the variance factor with Kullback risk This problem is close to ours and corresponds to the case γ = 1 A motivation for the present work was to extend their results to the heteroscedastic case γ > 1 in order to get oracle inequalities by minimization of penalized criterion as 13 Assuming that the collection of models is not too large, we obtain inequalities with the same flavor up to a logarithmic factor E[KP s,σ, Pŝˆm,ˆσ ˆm ] { } C inf inf K P s,σ, P t,τ + D m log 1+ǫ D m + R 14 m M t,τ S m Σ m where C and R are positive constants depending in particular on γ and ǫ is a positive parameter A non-asymptotic model selection approach for estimation problem in heteroscedastic Gaussian model was studied in few papers only In the chapter 6 of [4], Arlot estimates the mean in heteroscedastic regression framework but for bounded data For polynomial collection of models, he uses resampling penalties to get oracle inequalities for quadratic risk Recently, Galtchouk and Pergamenshchikov [14] have provided an adaptive nonparametric estimation procedure for the mean in a heteroscedastic Gaussian regression model They obtain an oracle inequality for the quadratic risk under some regularity assumptions Closer to our problem, Comte and Rozenholc [1] have estimated the pair s, σ Their estimation procedure is different from ours and it makes the theoretical results difficultly comparable between us For instance, they proceed in two steps one for the mean and one for the variance and they give risk bounds separately for each parameter in L -norm while we estimate directly the pair s, σ for Kullback risk

5 X Gendre/Simultaneous estimation of the mean and the variance 1349 As described in [8], one of the main advantages of inequalities such as 14 is that they allow us to derive uniform convergence rates for the risk of the selected estimator over many classes of smoothness Considering a collection of histogram models, we provide convergence rates over Hölderian balls Indeed, for α 1, α 0, 1], if s is α 1 -Hölderian and σ is α -Hölderian, we prove that the risk of ŝˆm, ˆσ ˆm converges with a rate of order of n log 1+ǫ n α/α+1 where α = min{α 1, α } is the worst regularity To compare this rate, we can think of the homoscedastic case with only one observation of Y Indeed, in this case, the optimal rate of convergence in the minimax sense is n α/α+1 and, up to a logarithmic loss, our rate is comparable to this one To our knowledge, our results in non-asymptotic estimation of the mean and the variance in heteroscedastic Gaussian model are new The paper is organized as follows The main results are presented in section In section 3, we carry out a simulation study in order to illustrate the performances of our estimators in practice with the Kullback risk and the quadratic risk The last sections are devoted to the proofs and to some technical results Main results In a first time, we introduce the collection of models, the estimators and the procedure Next, we present the main results whose proofs can be found in the section 4 In the sequel, we consider the framework 11 and, for the sake of simplicity, we suppose that there exists an integer k n 0 such that n = kn 1 Model collection and estimators In order to estimate the mean and the variance, we consider linear subspaces of R n constructed as follows Let M be a countable or finite set To each m M, we associate a regular partition p m of {1,, kn } given by the p m = km consecutive blocks { i 1 k n k m + 1,, i kn km}, i = 1,, p m For any I p m and any x R n, let us denote by x I the vector of R n/ pm with coordinates x i i I Then, to each m M, we also associate a linear subspace E m of R n/ pm with dimension 1 d m kn km This set of pairs p m, E m allows us to construct a collection of models Hereafter, we identify each m M to its corresponding pair p m, E m For any m = p m, E m M, we introduce the subspace S m R n of the E m -piecewise vectors, S m = {x R n such that I p m, x I E m },

6 X Gendre/Simultaneous estimation of the mean and the variance 1350 and the subspace Σ m R n of the piecewise constant vectors, { } Σ m = g I ½ I, I p m, g I R The dimension of S m Σ m is denoted by D m = p m d m + 1 To estimate the pair s, σ, we only deal with models S m Σ m constructed in a such way More precisely, we consider a collection of products of linear subspaces F = {S m Σ m, m M} 1 where M is a set of pairs p m, E m as above In the paper, we will often make the following hypothesis on the collection of models: H θ There exists θ > 1 such that m M, n θ θ 1 γ + D m This hypothesis avoids handling models with dimension too great with respect to the number of observations Let m M, we denote by π m the orthogonal projection on S m We estimate s, σ by the pair of independent estimators ŝ m, ˆσ m S m Σ m given by ŝ m = π m Y [1] and ˆσ m = ˆσ m,i ½ I where I p m, ˆσ m,i = 1 Y [] i π m Y [] I i i I Thus, we get a collection of estimators {ŝ m, ˆσ m, m M} Risk upper bound We first study the risk on a single model to understand its order Take an arbitrary m M We define s m, σ m S m Σ m by and s m = π m s σ m = σ m,i ½ I where I p m, σ m,i = 1 s i s m,i + σ i I i I Easy computations proves that the pair s m, σ m reaches the minimum of the Kullback-Leibler divergence on S m Σ m, inf KP s,σ, P t,τ = KP s,σ, P sm,σ m t,τ S m Σ m = 1 log i I σm,i σ i

7 X Gendre/Simultaneous estimation of the mean and the variance 1351 The next proposition allows us to compare this quantity with the Kullback risk of ŝ m, ˆσ m Proposition 1 Let m M, if the hypothesis H θ is fulfilled, then KP s,σ, P sm,σ m D m 4γ E [K P s,σ, Pŝm,ˆσ m ] KP s,σ, P sm,σ m + κγ θ D m where κ > 1 is a constant that can be taken equal to 1 + e 1 As announced in 1, this result shows that the Kullback risk of the pair ŝ m, ˆσ m is of order of the sum of a bias term KP s,σ, P sm,σ m and a variance term which is proportional to D m Thus, minimizing the Kullback risk E [K P s,σ, Pŝm,ˆσ m ] among m M corresponds to finding a model that realizes a trade-off between these two terms Let pen be a non negative function on M, ˆm M is any minimizer of the penalized criterion ˆm argmin {L ŝ m, ˆσ m + penm} 3 m M In the sequel, we denote by s, σ = ŝˆm, ˆσ ˆm the selected pair of estimators It satisfies the following result: Theorem Under the hypothesis H θ, suppose there exist A, B > 0 such that, for any k, d N, M k,d = Card { m M such that p m = k and d m = d } A1 + d B 4 where M is the set defined at the beginning of the section 1 Moreover, assume that there exist δ, ǫ > 0 such that If we take then D m 5δγn log 1+ǫ, m M n 5 m M, penm = γθ + log 1+ǫ D m Dm 6 { E [K P s,σ, P s, σ ] C inf K Ps,σ, P sm,σ m + D m log 1+ǫ } D m + R 7 m M where R = Rγ, θ, A, B, ǫ, δ is a positive constant and C can be taken equal to κγθ + 1γθ C = 1 + log 1+ǫ The inequality 7 is close to an oracle inequality up to a logarithmic factor Thus, considering the penalty 6 whose order is slightly larger than the dimension of the model, the risk of the estimator provided by the criterion 13 is comparable to the minimum among the collection of models F

8 X Gendre/Simultaneous estimation of the mean and the variance Convergence rate One of the main advantages of an inequality as 7 is that it gives uniform convergence rates with respect to many well known classes of smoothness To illustrate this, we consider the particular case of the regression on a fixed design For example, in the framework 11, we suppose that 1 i n, s i = s r i/n and σ i = σ r i/n, where s r and σ r are two unknown functions that map [0, 1] to R In this section, we handle the normalized Kullback-Leibler divergence K n P s,σ, P t,τ = 1 n K P s,σ, P t,τ, and, for any α 0, 1 and any L > 0, we denote by H α L the space of the α-hölderian functions with constant L on [0, 1], H α L = {f : [0, 1] R : x, y [0, 1], fx fy L x y α } Moreover, we consider a collection of models F PC as described in the section 1 such that, for any m M, E m is the space of dyadic piecewise constant functions on d m blocks More precisely, let m = p m, E m M and consider the regular dyadic partition p m with p m d m blocks that is a refinement of p m We define S m as the space of the piecewise constant functions on p m, { S m = f = } f I ½ I such that I p m, f I R, I p m and Σ m as the space of the piecewise constant functions on p m, { Σ m = g = } g I ½ I such that I p m, g I R Then, the collection of models that we consider is F PC = {S m Σ m, m M} Note that this collection satisfies 4 with A = 1 and B = 0 The following result gives a uniform convergence rate for s, σ over Hölderian balls Proposition 3 Let α 1, α 0, 1], L 1, L > 0 and assume that H θ is fulfilled Consider the collection of models F PC and δ, ǫ > 0 such that, for any m M, D m 5δγn log 1+ǫ n Denoting by s, σ the estimator selected via the penalty 6, if n satisfies σ n L 1 σ + L e 41+ǫ

9 then X Gendre/Simultaneous estimation of the mean and the variance 1353 sup E [K n P s,σ, P s, σ ] C s r,σ r H α1 L 1 H α L n log 1+ǫ n α/α+1 8 where α = min{α 1, α } and C is a constant which depends on α 1, α, L 1, L, θ, γ, σ, δ and ǫ For the estimation of the mean s in quadratic risk with one observation of Y, Galtchouk and Pergamenshchikov [14] have computed the heteroscedastic minimax risk Under some assumptions on the regularity of σ r and assuming that s r H α1 L 1, they show that the order of the optimal rate of convergence in minimax sense is C α1,σn α1/α1+1 Concerning the estimation of the variance vector σ in quadratic risk with one observation of Y and unknown mean, Wang et al [19] have proved that the order of the minimax rate of convergence for the estimation of σ is C α1,α max { n 4α1, n α/α+1} once s r H α1 L 1 and σ r H α L For α 1, α 0, 1] the maximum of these two rates is of order n α/α+1 where α = min{α 1, α } is the worst among the regularities of s r and σ r Up to a logarithmic term, the rate of convergence over Hölderian balls given by our procedure recover this rate for the Kullback risk 3 Simulation study To illustrate our results, we consider the following pairs of functions s r, σ r defined on [0, 1] and, for each one, we precise the true value of γ: M1 γ = 4 if 0 x < 1/4 0 if 1/4 x < 1/ s r x = if 1/ x < 3/4 1 if 3/4 x 1 M γ = 1 M3 γ = 7/3 M4 γ = and σ r x = { if 0 x < 1/ 1 if 1/ x 1 s r x = 1 + sinπx + π/3 and σ r x = 1, s r x = 3x/ and σ r x = 1/ + sin4πx 1/ /3, s r x = 1 + sin4πx 1/ and σ r x = 3 + sinπx/ In all this section, we consider the collection of models F PC and we take n = 104 ie k n = 10 Let us first present how our procedure performs on the examples with the true value of γ for each simulation, ǫ = 10 and δ = 3,

10 X Gendre/Simultaneous estimation of the mean and the variance Fig 1 Estimation on the mean left and the variance right in the case M Fig Estimation on the mean left and the variance right in the case M in the assumption 5 and the penalty 6 with θ = The estimators are drawn in plain line and the true functions in dotted line In the case of M1, we can note that the procedure choose the good model in the sense that if the pair s r, σ r belongs to a model of F PC, this one is generally chosen by our procedure Repeating the simulation times with the framework of M1 gives us that, with probability higher than 999%, the probability for making this good choice is about ± Even if the mean does not belong to one of the S m s, the procedure recover the homoscedastic nature of the observations in the case M By doing simulations with the framework induced by M, the probability to choose an homoscedastic model is around ± with a confidence of 999% For more general framework as M3 and M4, the estimators perform visually well and detect the changements in the behaviour of the mean and the variance functions The parameter γ is supposed to be known and is present in the definition of the penalty 6 So, we naturally can ask what is its importance in the procedure In particular, what happens if we do not have the good value? The

11 X Gendre/Simultaneous estimation of the mean and the variance Fig 3 Estimation on the mean left and the variance right in the case M Fig 4 Estimation on the mean left and the variance right in the case M4 following table present some estimations of the ratio E [K P s,σ, P s, σ ]/ inf m M E [K P s,σ, Pŝm,ˆσ m ] for several values of γ These estimated values have been obtained with 500 repetitions for each one The main part of the computation time is devoted to the estimation of the oracle s risk In the cases M1, M3 and M4, the ratio does not suffer to much from small errors on the knowledge of γ The more affected case is the homoscedastic one but we see that the best estimation is obtained for the good value of γ as we could expect More generally, it is interesting to observe that, even if there is a small error on the value of γ, the ratio stays reasonably small In the regression framework with heteroscedastic noise, we can be interested in separate estimations of the mean and the variance functions Because our procedure provide a simultaneous estimation of these two functions, we can ask how perform our estimators s and σ individually Considering the quadratic risks E [ s s ] and E [ σ σ ] of s and σ respectively, it could be interesting to compare them to the minimal quadratic risk among the collection of estimators

12 X Gendre/Simultaneous estimation of the mean and the variance 1356 Table 1 Ratio between the Kullback risk of s, σ and the one of the oracle γ M M M M Table Ratio between the L -risk of s and the minimal one among the ŝ m s γ M M M M Table 3 Ratio between the L -risk of σ and the minimal one among the ˆσ m s γ M M M M To illustrate this, we give below two sets of estimations of the ratios E [ s s ] / inf m M E [ s ŝ m ] and E [ σ σ ] / inf m M E [ σ ˆσ m ] in the frameworks presented in the beginning of this section We can observe on the following estimations that the quadratic risks of our estimators are quite close to the minimal ones among the collection of models 4 Proofs For any I {1,, n} and any x, y R n, we introduce the notations x, y I = x i y i and x I = x i i I i I Let m M, we will use several times in the proofs the fact that, for any I p m, I ˆσ m,i σ χ I d m 1 41 where χ I d m 1 is a χ random variable with I d m 1 degrees of freedom

13 X Gendre/Simultaneous estimation of the mean and the variance Proof of the proposition 1 Recalling and using the independence between ŝ m and ˆσ m, we expand the Kullback risk of ŝ m, ˆσ m, E [KP s,σ, Pŝm,ˆσ m ] = 1 [ si ŝ m,i ] ˆσm,I E + φ 4 ˆσ m,i σ i i I = 1 [ ] 1 E E [ ] s ŝ m I ˆσ m,i + 1 [ E log ˆσ m,i + σ ] i 1 + log σ m,i σ m,i ˆσ m,i σ i i I = KP s,σ, P sm,σ m + 1 [ ] ˆσm,I I E φ σ m,i + 1 [ σi + s i s m,i ] σ m,i E ˆσ m,i i I + 1 [ ] 1 [ ] E π m Γ 1/ σ ε[1] I where E 1 = 1 [ I E φ E ˆσ m,i = KP s,σ, P sm,σ m + E 1 + E 43 ˆσm,I σ m,i ] and E = 1 E [ 1 ˆσ m,i To upper bound the first expectation, note that I p m, E[ˆσ m,i ] = σ m,i 1 π m,i,i σ i = σ m,i 1 ρ I I where i I 1 ρ I = π m,i,i σ i 0, 1 I σ m,i i I ] π m,i,i σ i We apply the lemmas 10 and 11 to each block I p m and, by concavity of the logarithm, we get [ ] [ ] [ ] ˆσm,I ˆσm,I σm,i E φ log E + E 1 σ m,i σ m,i ˆσ m,i log1 ρ I + 1 κγ ρ I I d m ρ I + 1 κγ ρ I I d m 1 ρ κγ I + 1 ρ I I d m i I

14 X Gendre/Simultaneous estimation of the mean and the variance 1358 Using H θ and the fact that ρ I γd m / I, we obtain E 1 1 I ρ κγ I + 1 ρ I I d m 1 γ θ p m d m γdm κγ I + I γd m I γd m I d m + κγ θ p m 44 The second expectation in 43 is easier to upper bound by using 41 and the fact that d m 1, E = 1 [ ] 1 π m,i,i σ i 1 E ˆσ m,i i I γ I d m I d m 3 γθ p m d m We now sum 44 and 45 to obtain 45 E 1 + E γ θ p m d m + κγ θ p m κγ θ D m For the lower bound, the positivity of φ in 4 and the independence between ŝ m and ˆσ m give us E [KP s,σ, Pŝm,ˆσ m ] 1 [ s ŝm ] I E ˆσ m,i 1 [ E s ŝm I] E [ˆσ m,i ] 1 s s m I + σ d m s s m I + I d mσ I It is obvious that the hypothesis H θ ensures d m I / Thus, we get σ d m I d m σ and E [KP s,σ, Pŝm,ˆσ m ] 1 I σ d m I d m σ p m d m D m γ 4γ To conclude, we know that ŝ m, ˆσ m S m Σ m and, by definition of s m, σ m, it implies E [KP s,σ, Pŝm,ˆσ m ] KP s,σ, P sm,σ m

15 X Gendre/Simultaneous estimation of the mean and the variance Proof of theorem We prove the following more general result: Theorem 4 Let α 0, 1 and consider a collection of positive weights {x m } m M If the hypothesis H θ is fulfilled and if then 1 αe [K P s,σ, P s, σ ] m M, penm γθd m + x m, 46 inf m M {E [K P s,σ, Pŝm,ˆσ m ] + penm} + R 1 M + R M where R 1 M and R M are defined by R 1 M = Cθ γ m M pm d m Cθ γ p m d m log1 + d m x m log1+dm and R M = α + γθ + 1 α m M p m exp n θ p m log 1 + α p m x m γnα + In these expressions, is the integral part and C is a positive constant that could be taken equal to 1 e/ e 1 Before proving this result, let us see how it implies the theorem The choice 6 for the penalty function corresponds to x m = D m log 1+ǫ D m in 46 Applying the previous theorem with α = 1/ leads us to with and E [K P s,σ, P s, σ ] inf m M {E [K P s,σ, Pŝm,ˆσ m ] + penm} + Cθ γr 1 + 8γθ + 1R R 1 = m M pm d m Cθ γ p m d m log1 + d m m M x m log1+dm R = p m exp n θ p m log 1 + p m x m 5γn Using the upper bound on the risk of the proposition 1, we easily obtain the coefficient of the infimum in 7 Thus, it remains to prove that the two quantities R 1 and R can be upper bounded independently of n For this, we denote

16 X Gendre/Simultaneous estimation of the mean and the variance 1360 by B = B + logcθ γ + 1 and we compute R 1 = m M k 0 d 1 pm d m M k,d k/ d Cθ γ p m d m log1 + d m p m 1 + d m log 1+ǫ p m 1 + d m Cθ γ k/ log1 + d k log + log1 + d 1+ǫ A 1 + d B k/ k/ log1 + d k log + log1 + d 1+ǫ k 0 d 1 AR 1 + R 1 We have split the sum in two terms, the first one is for d = 1, R 1 = B log k log + log k 0 B 1+ǫ = log ǫ The other part R 1 is for d and is equal to k 0 d k 0 log1+dm log1+d 1 k ǫ < log1+d log1+d 1 + d B k log1+d 1/ log1 + d k log + log1 + d 1+ǫ Noting that 1 < log1 + d log1 + d, we have R 1 k/ + d k 0 d 1 B exp ǫ log1 + d log log1 + d 1 + d B ǫ loglog1+d < 1 d We now handle R Our choice of x m = D m log 1+ǫ D m and the hypothesis 5 imply p m x m 5γn δ p m = 1 δ p m δ p m We recall that, for any a 0, 1, if 0 t 1 a/a, then log1+t at Take a = δ p m to obtain log 1 + p m x m p m x m 5γn 5δ p m + 1γn x m 5δ + 1γn

17 X Gendre/Simultaneous estimation of the mean and the variance 1361 For any positive t, 1 + t 1+ǫ 1 + t 1+ǫ, then we finally obtain R = p m exp n θ p m log 1 + p m x m 5γn m M x m p m exp 10θγδ + 1 p m m M M k,d k exp 1 + dlog1+ǫ k 1 + d 10θγδ + 1 where we have set k 0 d 1 AR R R = k 0exp k log k log 1+ǫ < 5θγδ + 1 and R = exp B log1 + d 1 + dlog1+ǫ 1 + d < 10θγδ + 1 d 1 We now have to prove theorem 4 For an arbitrary m M, we begin the proof by expanding the Kullback-Leibler divergence of s, σ, KP s,σ, P s, σ = 1 n s i s i σi + φ σ i σ i = KP s,σ, Pŝm,ˆσ m + [Lŝ m, ˆσ m KP s,σ, Pŝm,ˆσ m ] By the definition 3 of ˆm, the inequality + [L s, σ Lŝ m, ˆσ m ] + [KP s,σ, P s, σ L s, σ] L s, σ Lŝ m, ˆσ m penm pen ˆm 47 is true for any m M The difference between the divergence and the likelihood can be expressed as KP s,σ, Pŝm,ˆσ m Lŝ m, ˆσ m = 1 σi 1 1 ε [1] i 48 ˆσ m,i i I s i ŝ m,i σ i ε [1] i 1 ˆσ m,i n ε [1] i + log σi

18 X Gendre/Simultaneous estimation of the mean and the variance 136 Using 47 and 48, for any α 0, 1, we can write 1 αkp s,σ, P s, σ 49 where, for any m M, KP s,σ, Pŝm,ˆσ m + penm + Gm + W 1 ˆm + W ˆm + Z ˆm pen ˆm W 1 m = 1 ˆσ m,i πm Γ 1/ σ ε[1] I, W m = 1 s m s, Γ 1/ σ ˆσ ε[1] α m,i I s m s I 1 ε [1] Zm = 1 i I σi ˆσ m,i 1 i I i αφ ˆσm,I σ i and Gm = 1 s ŝ m, Γ 1/ σ ε [1] ˆσ 1 σi 1 1 ε [1] i m,i I ˆσ m,i We split the proof of theorem 4 in several lemmas Lemma 5 For any m M, we have E[Gm] 0 Proof Let us compute this expectation to obtain the inequality By independence between ε [1] and ε [], we get [ E Gm ε []] = 1 [ E π m Γ 1/ σ ε [1], Γ 1/ σ ε [1] ] ˆσ m,i I = 1 [ ] πm E Γ 1/ σ ε [1] ˆσ m,i I It leads to E[Gm] = E [ E [ Gm ε [] ]] 0 In order to control Zm, we split it in two terms that we study separately, where and Z + m = 1 Z m = 1 i I i I Zm = Z + m + Z m σi 1 ˆσ m,i + σi 1 ˆσ m,i 1 ε [1] i ε [1] i, ˆσm,I αφ ½ˆσm,I σi 1 αφ σ i ˆσm,I σ i ½ˆσm,I>σi

19 X Gendre/Simultaneous estimation of the mean and the variance 1363 Lemma 6 Let m M and x be a positive number Under the hypothesis H θ, we get E [ ] γθ p m Z + m x + exp n d m + 3 p m log 1 + α p m x α p m γn Proof We begin by setting, for all 1 i n, T i m = n σ i /ˆσ m,i 1 + j=1 σ j/ˆσ m,j 1 + 1/ and we denote by Sm = n T i m 1 ε [1] i We lower bound the function φ by the remark a 0, 1, u [a, 1], 1 u 1 a φu Thus, we obtain n σi 1 max ˆσ m,i i n + and we use this inequality to get σ i ˆσ m,i n φ j=1 ˆσm,j Z + m = 1 n 1/ σi 1 Sm α n φ ˆσ m,i + Mm Sm + α n ˆσm,i φ σ i 1 σ i max Sm + 4α i n ˆσ m,i σ j ½ˆσm,j σ j = Mm ½ˆσm,i σ i ˆσm,i σ i ½ˆσm,i σ i To control Sm, we use the inequality 4 in [15], conditionally to ε [] Let u > 0, [ σ i P max Sm σ ] i ε [] + u = E P Sm u/ max i n ˆσ m,i i n ˆσ m,i [ E exp u ] 4 min ˆσ m,i i n σ i By the remark 41, we can upper bound it by σ i P max Sm + u E i n ˆσ m,i [ exp u ] 4γ min X I

20 X Gendre/Simultaneous estimation of the mean and the variance 1364 where the X I s are iid random variables with a χ I d m 1/ I distribution For any λ > 0, we know that the Laplace transform of X I is given by E [ ] e λxi = 1 + λ I dm 1/ I Let t > 0, the following expectation is dominated by [ E Z + m γ ] α + t = P Z + m γt 0 α + u du [ αu E exp 0 γ + t ] min X I du [ αu E max exp γ + t ] X I du Using H θ and 410, we roughly upper bound the maximum by the sum of the Laplace transforms and we get [ E Z + m γ ] α + t γ I 1 + t I dm 3/ α I d m 3 I γθ p m exp n d m + 3 p m log 1 + t p m α p m n Take t = αx/γ to conclude Lemma 7 Let m M and x be a positive number, then E [ Z m α + 1x + ] α + 1 α e αx Proof Note that for all u > 1, we have φu 1 u 1 Let t > 0, we handle Z m conditionally to ε [] and, using the previous lower bound on φ, we obtain P Z m α + 1 ε α t [] P 1 n σi 1 ˆσ m,i ε [1] i 1 α + 1 α t + α 4 n σi 1 ˆσ m,i ε []

21 P 1 X Gendre/Simultaneous estimation of the mean and the variance 1365 n Let us note that σi 1 ˆσ m,i ε [1] i 1 t + t σi max 1 1, i n ˆσ m,i thus, we can apply the inequality 41 from [15] to get P Z m α + 1 α t exp t/ This inequality leads us to [ E Z m α + 1 ] α t + Take t = αx to get the announced result + n α+1t/α α + 1 α e t σi 1 ˆσ m,i ε [] PZ m udu It remains to control W 1 m and W m For the first one, we now prove a Rosenthal-type inequality Lemma 8 Consider any m M Under the hypothesis H θ, for any x > 0, we have E[W 1 m γθd m x + ] Cθ γ p m d m Cθ γ log1+dm p m d m log1 + d m x where is the integral part and C is a positive constant that could be taken equal to C = 1 e e 1 Proof Using the lemma 10 and the remark 41, we dominate W 1 m, W 1 m W 1m = γ I d m I d m 1 F I = γnd m n p m 1 + d m F I where the F I s are iid Fisher random variables of parameters d m, n/ p m d m 1 We denote by F m the distribution of the F I s and we have γ D m γ p m d m E[W 1m] γθ p m d m γθd m Take x > 0 and an integer q > 1, then E [ W 1 m E[W 1 m] x ] E [ W 1 m E[W ] 1 + m]q q 1x q 1

22 X Gendre/Simultaneous estimation of the mean and the variance 1366 We set V = W 1m E[W 1m] It is the sum of the independent centered random variables X I = γnd m n p m 1 + d m F I E[F I ], I p m To dominate E [ V q +], we use the theorem 9 in [11] Let us compute E[X I] = γ n d m n 3 p m p m n p m d m + 3 n p m d m + 5 γ θ 3 p m d m and so, E [ V+] q 1/q 1κ γ θ 3 p m d m q + qκ [ E max X I q] 1/q e e 1 where κ = We consider q = 1 + log1 + d m where is the integral part For this choice, q 1 + d m and it implies p m q < n p m 1 + d m The hypothesis H θ allows us to make a such choice We roughly upper bound the maximum by the sum and we use H θ to get [ E max X I q] [ γθd m q E max F I E[F I ] q] γθd m q q 1 E[F m ] q + p m E[Fm q ] γθ q d m + p m γθdm 1 + q 1/d m 1 p m q/n p m 1 + d m 6γθ d m q pm Thus, it gives E [ V+] q 1/q γθ 1κ p m d m q + 6κ p m 1/q d m q 6κ γθ pm d m q + p m 1/q d m q Injecting this inequality in 411 leads to E [ W 1m E[W 1m] x + ] Cγθ p m d m 1κ γθ p m d m 1 + log1 + d m Cγθ log1+dm p m d m 1 + log1 + d m x q

23 X Gendre/Simultaneous estimation of the mean and the variance 1367 Lemma 9 Consider any m M and let x be a positive number Under the hypothesis H θ, we have E [ ] γθ p m W m x + exp n d m + 3 p m log 1 + α p m x α p m γn Proof Let us define Am = s s m I ˆσ m,i The distribution of W m conditionally to ε [] is Gaussian with mean equal to αam/ and variance factor 1/ Γ σ s s m I ˆσ m,i If ζ is a standard Gaussian random variable, it is well known that, for any λ > 0, Pζ λ e λ 41 We apply the Gaussian inequality 41 to W m conditionally to ε [], t t > 0, P W m + α Am Γ 1/ σ s s m I e t It leads to ˆσ m,i ε [] P W m + α Am σ i ε [] tam max e t i n ˆσ m,i and thus, by the remark 41, P W m γt α max X 1 I p I ε [] P m W m t α max i n σ i ε [] e t ˆσ m,i where the X I s are iid random variables with a χ I d m 1/ I distribution Finally, we integrate following ε [] and we get [ PW m t E max exp αt ] γ X I We finish as we did for Z + m, [ E W m γ ] α + t + [ αu E max exp 0 γ + t γθ 1 + t I dm 3/ α I γθ p m exp n d m + 3 p m log α p m X I ] du 1 + t p m n

24 X Gendre/Simultaneous estimation of the mean and the variance 1368 In order to end the proof of theorem 4, we need to put together the results of the previous lemmas Because γ 1, for any x > 0, we can write e αx exp n p m log 1 + α p m x γn We now come back to 49 and we apply the preceding results to each model Let m M, we take x = x m + α and, recalling 46, we get the following inequalities 1 αe[kp s,σ, P s, σ ] [ E[KP s,σ, Pŝm,ˆσ m ] + penm + E W 1 ˆm γθd ˆm x ] ˆm + α + [ + E W ˆm x ] [ ˆm + E Z + ˆm x ] ˆm + α + + α + [ ] x ˆm + E Z ˆm 1 + α + α E[Km] + penm + R 1 M + R M 413 where R 1 M and R M are the sums defined in the theorem 4 As the choice of m is arbitrary, we can take the infimum among m M in the right part of Proof of the proposition 3 For the collection F PC, we have A = 1 and B = 0 in 4 Let m M, we denote by σ m Σ m the quantity σ m = σ m,i ½ I with I p m, σ m,i = 1 σ i I The theorem gives us E [K n P s,σ, P s, σ ] C n inf { KPs,σ, P sm,σ m + D m log 1+ǫ } R D m + m M n C n inf { KPs,σ, P sm, σ m + D m log 1+ǫ } R D m + m M n { s sm C inf + σ σ m } + D m M nσ nσ m log 1+ǫ D m + R n because, for any x > 0, φx x 1/x i I

25 X Gendre/Simultaneous estimation of the mean and the variance 1369 Assuming s r, σ r H α1 L 1 H α L, we know see [13] that s s m nl 1 p m d m α1 and Thus, we obtain σ σ m nl p m α E [K n P s,σ, P s, σ ] C inf m M If α 1 < α, we can take and { L 1 σ p m d m α1 + L σ p m d m = p m = L 1 n σ log 1+ǫ n L n σ log 1+ǫ n } p m α + log1+ǫ n D m + R n n 1/1+α1 1/1+α For α 1 α, this choice is not allowed because it would imply d m = 0 So, in this case, we take L d m = 1 and p m = 1 σ + L 1/1+α n σ log 1+ǫ n In the two situation, we obtain the announced result 5 Technical results This section is devoted to some useful technical results Some notations previously introduced can have a different meaning here Lemma 10 Let Σ be a positive symmetric n n-matrix and σ 1,, σ n > 0 be its eigenvalues Let P be an orthogonal projection of rank D 1 If we denote M = P ΣP, then M is a non-negative symmetric matrix of rank D and, if τ 1,, τ D are its positive eigenvalues, we have min σ i min τ i and max τ i max σ i 1 i n 1 i D 1 i D 1 i n Proof We denote by Σ 1/ the symmetric square root of Σ By a classical result, M has the same rank, equal to D, than PΣ 1/ On a first side, we have

26 X Gendre/Simultaneous estimation of the mean and the variance 1370 max τ PΣPx, x i = sup 1 i D x R n x x 0 PΣx, x = sup x 1,x kerp imp x 1 + x x 1,x 0,0 On the other side, we can write min τ i = min 1 i D V R n dimv =n D+1 Σx, x sup max x imp x σ i 1 i n x 0 = min V R n dimv =n D+1 min V R n dimv =n D+1 min V R n dimv 1 max x V x 0 max x V x 0 max x V x 0 Mx, x x Σ 1/ Px max x V imp x 0 Σ 1/ x x x Σ 1/ x x = min 1 i n σ i Lemma 11 Let ε be a standard Gaussian vector in R n, a = a 1,, a n R n and b 1,, b n > 0 We denote by b resp b the maximum resp minimum of the b i s If n > and Z = n a i + b i ε i, then [ ] 1 E κb /b Z E[Z] n where κ > 1 is a constant that can be taken equal to 1 + e Proof We recall that E[Z] = n i=0 a i + b i and, for any λ > 0, the Laplace transform of a i + b i ε i is [ E exp λa i + b i ε i ] = exp Thus, the Laplace transform of Z is equal to λa i λb i log1 + λb i ψλ = E [ e λz] n λa i = exp 1 n log1 + λb i 1 + λb i n = e λe[z] λ a i exp b i 1 n rλb i 1 + λb i i=0

27 X Gendre/Simultaneous estimation of the mean and the variance 1371 where rx = log1 + x x for all x > 0 To compute the expectation of the inverse of Z, we integrate ψ by parts, [ ] 1 E = ψλdλ Z 0 n = e λe[z] λ a i exp b i 1 n rλb i dλ 1 + λb i where = 0 1 E[Z] + 1 E[Z] f a,b λ = n i=0 0 i=0 f a,b λψλdλ λb i + 4λa i b i1 + λb i 1 + λb i 1 + λb i We now upper bound the integral, [ ] E[Z] E Z 1 = f a,b λ exp n λa i /1 + λb i n dλ 1 + λbi where we have set nλb 1 + λb 0 g a,b λ = 1+n/ dλ 4b 1 + λb 1 + λb 1+n/ g a,bλe ga,bλ dλ n λa i 1 + λb i For any t > 0, te t e 1 Because g a,b is a positive function and n >, we obtain [ ] E[Z] E Z 1 nλb λb 1+n/dλ + 4b 1 + λb 0 e1 + λb 1+n/dλ b /b + 4b /b n + b /b n enn b /b n e 1 b /b n n 1 en References [1] Akaike, H 1969 Statistical predictor identification Annals Inst Statist Math MR08633 [] Akaike, H 1973 Information theory and an extension of the maximum likelihood principle In Proceedings nd International Symposium on Information Theory, P Petrov and F Csaki, Eds Akademia Kiado, Budapest, MR048315

28 X Gendre/Simultaneous estimation of the mean and the variance 137 [3] Akaike, H 1974 A new look at the statistical model identification IEEE Trans on Automatic Control MR [4] Arlot, S 007 Rééchantillonnage et sélection de modèles PhD thesis, Université Paris 11 [5] Baraud, Y 000 Model selection for regression on a fixed design Probab Theory Related Fields 117, MR [6] Baraud, Y, Giraud, C, and Huet, S 006 Gaussian model selection with an unknown variance To appear in Annals of Statistics [7] Barron, A, Birgé, L, and Massart, P 1999 Risk bounds for model selection via penalization Probab Theory Related Fields 113, MR [8] Birgé, L and Massart, P 1997 From model selection to adaptive estimation Festschrift for Lucien Lecam: Research Papers in Probability and Statistics MR [9] Birgé, L and Massart, P 001a Gaussian model selection Journal of the European Mathematical Society 3, 3, MR [10] Birgé, L and Massart, P 001b A generalized c p criterion for gaussian model selection Prépublication 647, Universités de Paris 6 and Paris 7 [11] Boucheron, S, Bousquet, O, Lugosi, G, and Massart, P 005 Moment inequalities for functions of independent random variables Annals of Probability 33,, MR1300 [1] Comte, F and Rozenholc, Y 00 Adaptive estimation of mean and volatility functions in auto-regressive models Stochastic Processes and their Applications 97, 1, MR [13] DeVore, R and Lorentz, G 1993 Constructive approximation Vol 303 Springer-Verlag MR [14] Galtchouk, L and Pergamenshchikov, S 005 Efficient adaptive nonparametric estimation in heteroscedastic regression models Université Louis Pasteur, IRMA, Preprint MR [15] Laurent, B and Massart, P 000 Adaptive estimation of a quadratic functional by model selection Annals of Statistics 8, 5, MR [16] Mallows, C 1973 Some comments on c p Technometrics 15, [17] McQuarrie, A and Tsai, C 1998 Regression and times series model selection World Scientific Publishing Co, Inc MR [18] R Development Core Team 007 R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria ISBN , [19] Wang, L and al 008 Effect of mean on variance function estimation in nonparametric regression The Annals of Statistics 36,, MR396810

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA

GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Submitted to the Annals of Statistics GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Let Y be a Gaussian

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

Kernel change-point detection

Kernel change-point detection 1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

arxiv:math/ v3 [math.st] 1 Apr 2009

arxiv:math/ v3 [math.st] 1 Apr 2009 The Annals of Statistics 009, Vol. 37, No., 630 67 DOI: 10.114/07-AOS573 c Institute of Mathematical Statistics, 009 arxiv:math/070150v3 [math.st] 1 Apr 009 GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE

More information

Risk bounds for model selection via penalization

Risk bounds for model selection via penalization Probab. Theory Relat. Fields 113, 301 413 (1999) 1 Risk bounds for model selection via penalization Andrew Barron 1,, Lucien Birgé,, Pascal Massart 3, 1 Department of Statistics, Yale University, P.O.

More information

arxiv: v1 [math.st] 10 May 2009

arxiv: v1 [math.st] 10 May 2009 ESTIMATOR SELECTION WITH RESPECT TO HELLINGER-TYPE RISKS ariv:0905.1486v1 math.st 10 May 009 YANNICK BARAUD Abstract. We observe a random measure N and aim at estimating its intensity s. This statistical

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Kernels to detect abrupt changes in time series

Kernels to detect abrupt changes in time series 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES

More information

Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function

Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function By Y. Baraud, S. Huet, B. Laurent Ecole Normale Supérieure, DMA INRA,

More information

Using CART to Detect Multiple Change Points in the Mean for large samples

Using CART to Detect Multiple Change Points in the Mean for large samples Using CART to Detect Multiple Change Points in the Mean for large samples by Servane Gey and Emilie Lebarbier Research Report No. 12 February 28 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry,

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

A Lower Bound Theorem. Lin Hu.

A Lower Bound Theorem. Lin Hu. American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Gaussian model selection

Gaussian model selection J. Eur. Math. Soc. 3, 203 268 (2001) Digital Object Identifier (DOI) 10.1007/s100970100031 Lucien Birgé Pascal Massart Gaussian model selection Received February 1, 1999 / final version received January

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis

More information

An overview of a nonparametric estimation method for Lévy processes

An overview of a nonparametric estimation method for Lévy processes An overview of a nonparametric estimation method for Lévy processes Enrique Figueroa-López Department of Mathematics Purdue University 150 N. University Street, West Lafayette, IN 47906 jfiguero@math.purdue.edu

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities

Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities Probab. Theory Relat. Fields 126, 103 153 2003 Digital Object Identifier DOI 10.1007/s00440-003-0259-1 Patricia Reynaud-Bouret Adaptive estimation of the intensity of inhomogeneous Poisson processes via

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Concentration, self-bounding functions

Concentration, self-bounding functions Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

An efficient algorithm for T-estimation

An efficient algorithm for T-estimation An efficient algorithm for T-estimation Nelo Magalhães, Yves Rozenholc To cite this version: Nelo Magalhães, Yves Rozenholc. An efficient algorithm for T-estimation. 2014. HAL Id: hal-00986229

More information

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Statistica Sinica 19 (2009), 159-176 MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Jussi Klemelä University of Oulu Abstract: We consider estimation of multivariate densities with histograms which

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

Nonparametric estimation for pure jump Lévy processes based on high frequency data

Nonparametric estimation for pure jump Lévy processes based on high frequency data Stochastic Processes and their Applications 9 29 488 423 www.elsevier.com/locate/spa Nonparametric estimation for pure jump Lévy processes based on high frequency data F. Comte, V. Genon-Catalot Université

More information

Risk Bounds for CART Classifiers under a Margin Condition

Risk Bounds for CART Classifiers under a Margin Condition arxiv:0902.3130v5 stat.ml 1 Mar 2012 Risk Bounds for CART Classifiers under a Margin Condition Servane Gey March 2, 2012 Abstract Non asymptotic risk bounds for Classification And Regression Trees (CART)

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

NON ASYMPTOTIC MINIMAX RATES OF TESTING IN SIGNAL DETECTION

NON ASYMPTOTIC MINIMAX RATES OF TESTING IN SIGNAL DETECTION NON ASYMPTOTIC MINIMAX RATES OF TESTING IN SIGNAL DETECTION YANNICK BARAUD Abstract. Let Y = (Y i ) i I be a finite or countable sequence of independent Gaussian random variables of mean f = (f i ) i I

More information

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012 Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( ) Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS The Annals of Probability 2001, Vol. 29, No. 1, 411 417 MAJORIZING MEASURES WITHOUT MEASURES By Michel Talagrand URA 754 AU CNRS We give a reformulation of majorizing measures that does not involve measures,

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

Stein s Method for Matrix Concentration

Stein s Method for Matrix Concentration Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5. VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)

More information

On the Goodness-of-Fit Tests for Some Continuous Time Processes

On the Goodness-of-Fit Tests for Some Continuous Time Processes On the Goodness-of-Fit Tests for Some Continuous Time Processes Sergueï Dachian and Yury A. Kutoyants Laboratoire de Mathématiques, Université Blaise Pascal Laboratoire de Statistique et Processus, Université

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

On the asymptotic normality of frequency polygons for strongly mixing spatial processes. Mohamed EL MACHKOURI

On the asymptotic normality of frequency polygons for strongly mixing spatial processes. Mohamed EL MACHKOURI On the asymptotic normality of frequency polygons for strongly mixing spatial processes Mohamed EL MACHKOURI Laboratoire de Mathématiques Raphaël Salem UMR CNRS 6085, Université de Rouen France mohamed.elmachkouri@univ-rouen.fr

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011

More information

Some functional (Hölderian) limit theorems and their applications (II)

Some functional (Hölderian) limit theorems and their applications (II) Some functional (Hölderian) limit theorems and their applications (II) Alfredas Račkauskas Vilnius University Outils Statistiques et Probabilistes pour la Finance Université de Rouen June 1 5, Rouen (Rouen

More information

Nonlinear Programming Models

Nonlinear Programming Models Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

On a Class of Multidimensional Optimal Transportation Problems

On a Class of Multidimensional Optimal Transportation Problems Journal of Convex Analysis Volume 10 (2003), No. 2, 517 529 On a Class of Multidimensional Optimal Transportation Problems G. Carlier Université Bordeaux 1, MAB, UMR CNRS 5466, France and Université Bordeaux

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Key Words: first-order stationary autoregressive process, autocorrelation, entropy loss function,

Key Words: first-order stationary autoregressive process, autocorrelation, entropy loss function, ESTIMATING AR(1) WITH ENTROPY LOSS FUNCTION Małgorzata Schroeder 1 Institute of Mathematics University of Białystok Akademicka, 15-67 Białystok, Poland mszuli@mathuwbedupl Ryszard Zieliński Department

More information

Segmentation of the mean of heteroscedastic data via cross-validation

Segmentation of the mean of heteroscedastic data via cross-validation Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009

More information