On the estimation of the mean of a random vector

Size: px
Start display at page:

Download "On the estimation of the mean of a random vector"

Transcription

1 O the estimatio of the mea of a radom vector Emilie Joly Uiversit Paris Ouest Naterre, Frace; emilie.joly@u-paris10.fr Gábor Lugosi ICREA ad Departmet of Ecoomics, Pompeu Fabra Uiversity, Barceloa, Spai; gabor.lugosi@upf.edu Roberto Imbuzeiro Oliveira IMPA, Rio de Jaeiro, RJ, Brazil; rimfo@impa.br July 8, 2016 Abstract We study the problem of estimatig the mea of a multivariate distributio based o idepedet samples. The mai result is the proof of existece of a estimator with a o-asymptotic sub-gaussia performace for all distributios satisfyig some mild momet assumptios. 1 Itroductio Let X be a radom vector takig values i R d. We assume throughout the paper that the mea vector µ = EX ad covariace matrix Σ = (X µ)(x µ) T exist. Give idepedet, idetically distributed samples X 1,..., X draw from the distributio of X, oe wishes to estimate the mea vector. A atural ad popular choice is the sample mea (1/) X i that is kow to have a ear-optimal behavior wheever the distributio is sufficietly light tailed. However, Supported by the Frech Agece Natioale de la Recherche (ANR), uder grat ANR-13-BS (project SPADRO). Supported by the Spaish Miistry of Ecoomy ad Competitiveess, Grat MTM P ad FEDER, EU. Support from CNPq, Brazil via Ciêcia sem Froteiras grat # / Supported by a Bolsa de Produtividade em Pesquisa from CNPq, Brazil. Supported by FAPESP Ceter for Neuromathematics (grat# 2013/ , FAPESP - S. Paulo Research Foudatio). 1

2 wheever heavy tails are a cocer, the sample mea is to be avoided as it may have a suboptimal performace. While the oe-dimesioal case (i.e., d = 1) is quite well uderstood (see [3], [5]), various aspects of the multidimesioal problem are still to be revealed. This paper aims at cotributig to the uderstadig of the multi-dimesioal case. Before statig the mai results, we briefly survey properties of some mea estimators of real-valued radom variables. Some of these techiques serve as basic buildig blocks for the estimators we propose for the vector-valued case. 1.1 Estimatig the mea of a real-valued radom variable Whe d = 1, the simplest ad most popular mea estimator is the sample mea µ = (1/) X i. The sample mea is ubiased ad the cetral limit theorem guaratees a asymptotically Gaussia distributio. However, uless the distributio of X has a light (e.g., sub-gaussia) tail, there are o o-asymptotic sub-gaussia performace guaratees for µ. We refer the reader to Catoi [3] for details. However, perhaps surprisigly, there exist estimators of µ with much better cocetratio properties, see Catoi [3] ad Devroye, Lerasle, Lugosi, ad Oliveira [5]. A coceptually simple ad quite powerful estimator is the so-called media-of-meas estimator that has bee proposed, i differet forms, i various papers, see Nemirovsky ad Yudi [14], Hsu [8], Jerrum, Valiat, ad Vazirai [10], Alo, Matias, ad Szegedy [1]. The media-of-meas estimator is defied as follows. Give a positive iteger b ad x 1,..., x b R, let q 1/2 deote the media of these umbers, that is, q 1/2 (x 1,..., x b ) = x i, where #{k [b] : x k x i } b 2 ad #{k [b] : x k x i } b 2. (If several i fit the above descriptio, we take the smallest oe.) For ay fixed δ [e 1 /2, 1), first choose b = l(1/δ) ad ote that b /2 holds. Next, partitio [] = {1,..., } ito b blocks B 1,..., B b, each of size B i /b 2. Give X 1,..., X, we compute the sample mea i each block Y i = 1 X j B i j B i ad defie the media-of-meas estimator by µ (δ) e.g., Hsu [8]) that for ay 4, { P µ (δ) = q 1/2 (Y 1,..., Y B ). Oe ca show (see, µ > 2e } (1 + l(1/δ)) 2Var(X) δ, (1) where Var(X) deotes the variace of X. Note that the media-of-meas estimator µ (δ) does ot require ay kowledge of the variace of X. However, it depeds o the desired cofidece level δ ad the partitio B 1,..., B b. Ay partitio satisfyig i, B i /b is valid i order to get (1). Hece, 2

3 we do ot keep the depedece o the partitio B 1,..., B b i the otatio µ (δ). Devroye, Lerasle, Lugosi, ad Oliveira [5] itroduce estimators that work for a large rage of cofidece levels uder some mild assumptios. Catoi [3] itroduces estimators of quite differet flavor ad gets a o-asymptotic result of the same form as (1). Bubeck, Cesa-Biachi ad Lugosi [2] apply these estimators i the cotext of badit problems. 1.2 Estimatig the mea of radom vectors Cosider ow the multi-dimesioal case whe d > 1. The sample mea µ = (1/) X i is still a obvious choice for estimatig the mea vector µ. If X has a multivariate ormal distributio with mea vector µ ad covariace matrix Σ, the µ is also multivariate ormal with mea µ ad covariace matrix (1/)Σ ad therefore, for δ (0, 1), with probability at least 1 δ, Tr(Σ) 2λmax log(1/δ) µ µ +, (2) where Tr(Σ) ad λ max deote the trace ad largest eigevalue of the covariace matrix, respectively (Haso ad Wright [7]). For o-gaussia ad possibly heavy-tailed distributios, oe caot expect such a sub-gaussia behavior of the sample mea. The mai goal of this paper is to ivestigate uder what coditios it is possible to defie mea estimators that reproduce a (o-asymptotic) sub-gaussia performace similar to (2). Lerasle ad Oliveira [11], Hsu ad Sabato [9], ad Misker [13] exted the media-ofmeas estimator to more geeral spaces. I particular, Misker s results imply that for each δ (0, 1) there exists a mea estimator µ (δ) ad a uiversal costat C such that, with probability at least 1 δ, µ (δ) µ C Tr(Σ) log(1/δ). (3) While this boud is quite remarkable ote that o assumptio other tha the existece of the covariace matrix is made, it does ot quite achieve a sub-gaussia performace boud that resembles (2). A istructive example is whe all eigevalues are idetical ad equal to λ max. If the dimesio d is large, (2) is of the order of (λ max /)(d + log(δ 1 )) while (3) gives the order (λ max /)(d log(δ 1 )). The mai result of this paper is the costructio of a mea estimator that, uder some mild momet assumptios, achieves a sub-gaussia performace boud i the sese of (2). More precisely, we prove the followig. Theorem 1 For all δ (0, 1) there exists a mea estimator µ (δ) ad a uiversal costat C such that if X 1,..., X are i.i.d. radom vectors i R d with mea µ R d ad covariace matrix Σ such that there exists a costat K > 0 such that, for all v R d with v = 1, E [ ((X µ) T v ) 4 ] K(v T Σv) 2, 3

4 the for all CK log d (d + log(1/δ)), ( ) Tr(Σ) µ (δ) λmax log(log d/δ) µ C +. The theorem guaratees the existece of a mea estimator whose performace matches the sub-gaussia boud (2), up to the additioal term of the order of (1/)λ max log log d for all distributios satisfyig the fourth-momet assumptio give above. The additioal term is clearly of mior importace. (For example, it is domiated by the first term wheever Tr(Σ) > λ max log log d.) With the estimator we costruct, this term is ievitable. O the other had, the iequality of the theorem oly holds for sample sizes that are at least a costat times d log d. This feature is ot desirable for truly high-dimesioal problems, especially takig ito accout that Misker s boud is dimesio-free. The fourth-momet assumptio ca be iterpreted as a boudedess assumptio of the kurtosis of (X µ) T v. The same assumptio has be used i Catoi [4] ad Giulii [6] for the robust estimatio of the Gram matrix. The fourth-momet assumptio may be weakeed to a aalogous (2 + ε)-th momet assumptio that we do ot detail for the clarity of the expositio. We prove the theorem by costructig a estimator i several steps. First we costruct a estimator that performs well for spherical distributios (i.e., for distributios whose covariace matrix has a trace comparable to dλ max ). This estimator is described i Sectio 2. I the secod step, we decompose the space i a data-depedet way ito the orthogoal sum of O(log d) subspaces such that all but oe subspaces are such that the projectio of X to the subspace has a spherical distributio. The last subspace is such that the projectio has a covariace matrix with a small trace. I each subspace we apply the first estimator ad combie them to obtai the fial estimator µ (δ). The proof below provides a explicit value of the costat C, though o attempt has bee made to optimize its value. The costructed estimator is computatioally so demadig that eve for moderate values of d it is hopeless to compute it i reasoable time. I this sese, Theorem 1 should be regarded as a existece result. It is a iterestig a importat challege to costruct estimators with similar statistical performace that ca be computed i polyomial time (as a fuctio of ad d). Note that the estimator of Misker cited above may be computed by solvig a covex optimizatio problem, makig it computatioally feasible, see also Hsu ad Sabato [9] for further computatioal cosideratios. 2 A estimator for spherical distributios I this sectio we costruct a estimator that works well wheever the distributio of X is sufficietly spherical i the sese that a positive fractio of the eigevalues of the covariace matrix is of the same order as λ max. More precisely, for c 1, we call a distributio c-spherical if dλ max ctr(σ). 4

5 For each δ (0, 1) ad uit vector w S d 1 (where S d 1 = {x R d : x = 1}), we may defie m (δ) (w) as the media-of-meas estimate (as defied i Sectio 1.1) of w T µ = Ew T X based o the i.i.d. sample w T X 1,..., w T X. Let N 1/2 S d 1 be a miimal 1/2-cover, that is, a set of smallest cardiality that has the property that for all u S d 1 there exists w N 1/2 with u x 1/2. It is well kow (see, e.g., [12, Lemma ]) that N 1/2 8 d. Notig that Var(w T X) λ max, by (1) ad the uio boud, we have that, with probability at least 1 δ, m sup (δ/8d ) (w) w T l(e8 µ 2e 2λ d /δ) max. w N 1/2 I other words, if, for λ > 0, we defie the empirical polytope { } P δ,λ = x R d m : sup (δ/8d ) (w) w T x 2e 2λ l(e8d /δ), w N 1/2 the with probability at least 1 δ, µ P δ,λmax. I particular, o this evet, P δ,λmax is oempty. Suppose that a upper boud of the largest eigevalue of the covariace matrix λ λ max is available. The we may defie the mea estimator { ay elemet y,λ = Pδ,λ if P δ,λ 0 otherwise µ (δ) Now suppose that µ P δ,λ ad let y P δ,λ be arbitrary. Defie u = (y µ)/ y µ S d 1, ad let w N 1/2 be such that w u 1/2. (Such a w exists by defiitio of N 1/2.) The y µ = u T (y µ) = (u w) T (y µ) + w T (y µ) (1/2) y µ + 4e 2λ l(e8d /δ), where we used Cauchy-Schwarz ad the fact that y, µ P δ,λ. Rearragig, we obtai that, o the evet that µ P δ,λ, µ (δ),λ µ d l 8 + l(e/δ) 8e 2λ, provided that λ λ max. Summarizig, we have proved the followig. Propositio 1 Let λ > 0 ad δ (0, 1). For ay distributio with mea µ ad covariace matrix Σ such that λ max = Σ λ, the estimator µ (δ),λ defied above satisfies, with probability at least 1 δ, µ (δ),λ µ d l 8 + l(e/δ) 8e 2λ. I particular, if the distributio is c-spherical ad λ 2λ max, the µ (δ) ctr(σ) l 8 +,λ µ λmax l(e/δ) 16e. 5.

6 The boud we obtaied has the same sub-gaussia form as (2), up to a multiplicative costat, wheever the distributio is c-spherical. To make the estimator fully datadepedet, we eed to fid a estimate λ that falls i the iterval [λ max, 2λ max ], with high probability. This may be achieved by splittig the sample i two parts of equal size (assumig is eve), estimatig λ max usig samples from oe part ad computig the mea estimate defied above usig the other part. I the ext sectio we describe such a method as a part of a more geeral procedure. 3 Empirical eigedecompositio I the previous sectio we preseted a mea estimate that works well for spherical distributios. We will use this estimator as a buildig block i the costructio of a estimator that has the desirable performace guaratee for distributios with ay covariace matrix. I additio to fiite covariaces, we assume that there exists a costat K > 0 such that, for all v R d with v = 1, E [ ((X µ) T v ) 4 ] K(v T Σv) 2. (4) I this sectio we assume that 2(400e) 2 K log 3/2 d ( d log 25 + log(2 log 3/2 d) + log(1/δ) ). The basic idea is the followig. We split the data ito two equal halves. We use the first half i order to decompose the space ito the sum of orthogoal subspaces such that the projectio of X ito each subspace is 4-spherical. The we may estimate the projected meas by the estimator of the previous sectio. Next we describe how we obtai a orthogoal decompositio of the space based o i.i.d. observatios X 1,..., X. Let s = log 3/2 d 2 ad m = /s. Divide the sample ito s blocks, each of size at least m. I what follows, we describe a way of sequetially decomposig R d ito the orthogoal sum of s + 1 subspaces R d = V 1 V s+1. First we costruct V 1 usig the first block X 1,..., X m of observatios. The we use the secod block to build V 2, ad so o, for s blocks. The key properties we eed are that (a) the radom vector X, projected to ay of these subspaces has a 4-spherical distributio; (b) the largest eigevalue of the covariace matrix of X, projected o V i is at most λ max (2/3) i 1. To this ed, just like i the previous sectio, let N γ S d 1 be a miimal γ-cover of the uit sphere S d 1 for a sufficietly small costat γ (0, 1). The value γ = 1/100 is sufficiet for our purposes ad i the sequel we assume this value. Note that N γ (4/γ) d (see [12, Lemma ] for a proof of this fact). Iitially, we use the first block X 1,..., X m. We may assume that m is eve. Usig these observatios, for each u N γ, we compute a estimate V m (δ) (u) of u T Σu = E(u T (X µ)) 2 = (1/2)E(u T (X X )) 2, where X is a i.i.d. copy of X. We may costruct the estimate by formig m/2 i.i.d. radom variables (1/2)(u T (X 1 X m/2+1 )) 2,..., (1/2)(u T (X m/2 X m )) 2 ad estimate their mea by the media-of-meas estimate V m (δ) (u) with parameter 6

7 δ/(s(4/γ) d ). The (1), together with assumptio (4) implies that, with probability at least 1 δ/s, u T Σu V m (δ) (u) K log(s(4/γ)d /δ) def. 4e = ε u T m. Σu m sup u N γ Our assumptios o the sample size guaratee that ε m < 1/100. The evet that the iequality above holds is deoted by E 1 so that P{E 1 } 1 δ/s. Let M δ,m be the set of all symmetric positive semidefiite d d matrices M satisfyig sup u N γ u T Mu V m (δ) (u) ε u T m. Σu By the argumet above, Σ M δ,m o the evet E 1. I particular, o E 1, M δ,m i oempty. Defie the estimated covariace matrix { ay elemet of Σ (δ) Mδ,m if M m = δ,m 0 otherwise Sice o E 1 both Σ (δ) m ad Σ are i M δ,m, o this evet, we have ( u T Σu ) 1 ε m 1 + ε m u T Σ(δ) m u ( u T Σu ) 1 + ε m 1 ε m for all u N γ. (5) Now compute the spectral decompositio Σ (δ) m = d λ i v i v i T, where λ 1 λ d 0 are the eigevalues ad v 1,..., v d the correspodig orthogoal eigevectors. Let u S d 1 be arbitrary ad let v be a poit i N γ with smallest distace to u. The u T Σ(δ) m u = v T Σ(δ) m v + 2(u v) T Σ(δ) m v + (u v) T Σ(δ) m (u v) v T Σ(δ) m v + λ 1 (2γ + γ 2 ) (6) (by Cauchy-Schwarz ad usig the fact that u v γ) (v T Σv) 1 + ε m + 3γ λ 1 1 ε m (by (5)) 1 + ε m λ max + 3γ λ 1. 1 ε m I particular, o E 1 we have λ 1 βλ max where β = 1+εm 1 ε m /(1 3γ) <

8 By a similar argumet, we have that for ay u S d 1, if v is the poit i N γ with smallest distace to u, the o E 1, u T Σu (v T Σ(δ) m v) 1 + ε m 1 ε m + 3γλ max 1 + ε m 1 ε m λ1 + 3γλ max. I particular, λ max β λ 1 (4/3) λ 1. Similarly, u T Σu (v T Σ(δ) m v) 1 ε m 3γ λ ε m ( ) 1 u T εm Σ(δ) m u 3γ λ 1 3γ λ ε m ( ) 1 u T εm Σ(δ) m u 6γ λ 1. (7) 1 + ε m Let d 1 be umber of eigevalues λ i that are at least λ 1 /2 ad let V 1 be the subspace of R d spaed by v 1,..., v d1. Deote by Π 1 (X) the orthogoal projectio of the radom variable X (idepedet of the X i used to build V 1 ) oto V 1. The for ay u V 1 S d 1, o the evet E 1, by (7), ad therefore ( ) u T Σu λ 1 1 εm 1 12γ λ ε m 3 Eu T (Π 1 (X) EΠ 1 (X))(Π 1 (X) EΠ 1 (X)) T u = u T Σu ) ( λ1 3, 4 λ 1 3 I particular, the ratio of the largest ad smallest eigevalues of the covariace matrix of Π 1 (X) is at most 4 ad therefore the distributio of Π 1 (X) is 4-spherical. O the other had, o the evet E 1, for ay uit vector u V1 S d 1 i the orthogoal complemet of V 1, we have u T Σu 2λ max /3. To see this, ote that u T Σ(δ) m u λ 1 /2 ad therefore, deotig by v the poit i N γ closest to u, ( u T Σu = u T Σ(δ) m u + v T Σ ) ( ) (δ) Σ m v + v T Σ(δ) m v u T Σ(δ) m u + ( u T Σu v T Σv ) λ ε mλ max + 3γ λ 1 + 3γλ max (by (5), (6), ad a similar argumet for the last term) λ max (β ( γ ) ) + 2ε m + 3γ 2λ max 3 I other words, the largest eigevalue of the covariace matrix of Π 1 (X) (the projectio of X to the subspace V 1 ) is at most (2/3)λ max. 8..

9 I the ext step we costruct the subspace V 2 V1. To this ed, we proceed exactly as i the first step but ow we replace R d by V1 ad the sample X 1,..., X m o the first block by the variables Π 1 (X m+1 ),..., Π 1 (X 2m ) V1. (Recall that Π 1 (X i ) is the projectio of X i to the subspace V1 ). Just like i the first step, with probability at least 1 δ/s we obtai a (possibly empty) subspace V 2, orthogoal to V 1 such that Π 2 (X), the projectio of X o V 2, has a 4-spherical distributio ad largest eigevalue of the covariace matrix of Π 2 (X) (the projectio of X to the subspace (V 1 V 2 ) ) is at most (2/3) 2 λ max. We repeat the procedure s times ad use a uio boud the s evets. We obtai, with probability at least 1 δ, a sequece of subspaces V 1,..., V s, with the followig properties: (i) V 1,..., V s are orthogoal subspaces. (ii) For each i = 1,..., s, Π i (X), the projectio of X o V i, has a 4-spherical distributio. (iii) The largest eigevalue of the covariace matrix of Π i (X) is at most λ (i) 1 (2/3) i 1 λ max. (iv) The largest eigevalue λ (i) 1 of the estimated covariace matrix of Π i (X) satisfies (3/4)λ (i) (i) 1 λ 1 1.1λ (i) 1. Note that it may happe for some T < s, we have R d = V 1 V T. I that case we defie V T +1 = = V s =. 4 Puttig it all together I this sectio we costruct our fial multivariate mea estimator ad prove Theorem 1. To simplify otatio, we assume that the sample size is 2. This oly effects the value of the uiversal costat C i the statemet of the theorem. The data is split ito two equal halves (X 1,..., X ) ad (X +1,..., X 2 ). The secod half is used to costruct the orthogoal spaces V 1,..., V s as described i the previous sectio. Let d 1,..., d s deote the dimesio of these subspaces. Recall that, with probability at least 1 δ, the costructio is successful i the sese that the subspaces satisfy properties (i) (iv) described at the ed of the previous sectio. Deote this evet by E. I the rest of the argumet we coditio o (X +1,..., X 2 ) ad assume that E occurs. All probabilities below are coditioal. If s d i < d (i.e., V 1 V s R d ), the we defie V s+1 = (V 1 V s ) ad deote by d s+1 = d s d i the dimesio of V s+1. Let Π 1,..., Π s+1 deote the projectio operators o the subspaces V 1,..., V s+1, respectively. For each i = 1,..., s + 1, we use the vectors Π i (X 1 ),..., Π i (X ) to compute a estimator of the mea E [Π i (X) (X +1,..., X 2 )] = Π i (µ). For i = 1,..., s, we use the estimator defied i Sectio 2. I particular, withi the d i -dimesioal space V i, we compute µ i = µ (δ/(s+1)). Note that sice λ,(4/3) λ i comes from i a empirical estimatio of Σ restricted to a empirical subspace V i, µ i is a estimator 9

10 costructed o the sample X 1,..., X. The, by Propositio 1, with probability 1 δ/(s+ 1), ( (i) (8/3) λ µ i Π i (µ) 2 1 di l 8 + l(e(2 log 3/2 d + 1)/δ)) 2 (8e). I the last subspace V s+1, we may use Misker s estimator, based o Π s+1 (X 1 ),..., Π s+1 (X ) to compute a estimator µ s+1 = µ (δ/(s+1)) of Π s+1 (µ). Sice the largest eigevalue of the covariace matrix of Π s+1 (X) is at most λ max /d 2, usig (3), we obtai that, with probability 1 δ/(s + 1), µ s+1 Π s+1 (µ) 2 C λ max log((2 log 3/2 d + 1)/δ). Our fial estimator is µ (δ) = s+1 µ s+1. By the uio boud, we have that, with probability at least 1 δ, µ (δ) µ 2 = s+1 µ i Π i (µ) 2 2 (8/3) l 8 (8e) λ (i) 1 d i + (8e) 2 (8/3) l(e(2 log 3/2 d + 1)/δ) +C λ max log((2 log 3/2 d + 1)/δ) First otice that, by properties (iii) ad (iv) at the ed of the previous sectio, O the other had, sice λ (i) λ (i) 1 1.1λ max (2/3) i 1 3.3λ max. s+1 Tr(Σ) = E X µ 2 = E Π i (X) Π i (µ) 2 ad for i s each Π i (X) has a 4-spherical distributio, we have that λ (i) 1 d i 1.1 This cocludes the proof of Theorem 1. Refereces λ (i) 1 d i 4.4Tr(Σ). [1] N. Alo, Y. Matias, ad M. Szegedy. The space complexity of approximatig the frequecy momets. Joural of Computer ad System Scieces, 58: , λ (i) 1

11 [2] S. Bubeck, N. Cesa-Biachi, ad G. Lugosi. Badits with heavy tail. IEEE Trasactios o Iformatio Theory, 59: , [3] O. Catoi. Challegig the empirical mea ad empirical variace: a deviatio study. Aales de l Istitut Heri Poicaré, Probabilités et Statistiques, 48(4): , [4] O. Catoi. Pac-bayesia bouds for the gram matrix ad least squares regressio with a radom desig. arxiv preprit arxiv: , [5] L. Devroye, M. Lerasle, G. Lugosi, ad R.I. Oliveira. Sub-Gausssia mea estimators. Aals of Statistics, [6] I. Giulii. Robust dimesio-free gram operator estimates. arxiv preprit arxiv: , [7] D.L. Haso ad F.T. Wright. A boud o tail probabilities for quadratic forms i idepedet radom variables. Aals of Mathematical Statistics, 42: , [8] D. Hsu. Robust statistics [9] D. Hsu ad S. Sabato. Loss miimizatio ad parameter estimatio with heavy tails. Joural of Machie Learig Research, 17:1 40, [10] M. Jerrum, L. Valiat, ad V. Vazirai. Radom geeratio of combiatorial structures from a uiform distributio. Theoretical Computer Sciece, 43: , [11] M. Lerasle ad R. I. Oliveira. Robust empirical mea estimators. arxiv: , [12] J. Matoušek. Lectures o discrete geometry. Spriger, [13] S. Misker. Geometric media ad robust estimatio i Baach spaces. Beroulli, 21: , [14] A.S. Nemirovsky ad D.B. Yudi. Problem complexity ad method efficiecy i optimizatio

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

arxiv: v1 [math.st] 17 Apr 2015

arxiv: v1 [math.st] 17 Apr 2015 Robust estimatio of U-statistics arxiv:1504.04580v1 [math.st] 17 Apr 2015 Emilie Joly Gábor Lugosi April 20, 2015 This paper is dedicated to the memory of Evarist Gié. Abstract A importat part of the legacy

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Notes 27 : Brownian motion: path properties

Notes 27 : Brownian motion: path properties Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Sub-Gaussian mean estimators

Sub-Gaussian mean estimators Sub-Gaussia mea estimators Luc Devroye Matthieu Lerasle Gábor Lugosi Roberto I. Oliveira September 19, 2015 Abstract We discuss the possibilities ad limitatios of estimatig the mea of a real-valued radom

More information

THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES

THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES COMMUN. STATIST.-STOCHASTIC MODELS, 0(3), 525-532 (994) THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES Jack W. Silverstei Departmet of Mathematics, Box 8205 North Carolia

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

An almost sure invariance principle for trimmed sums of random vectors

An almost sure invariance principle for trimmed sums of random vectors Proc. Idia Acad. Sci. Math. Sci. Vol. 20, No. 5, November 200, pp. 6 68. Idia Academy of Scieces A almost sure ivariace priciple for trimmed sums of radom vectors KE-ANG FU School of Statistics ad Mathematics,

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference MPRA Muich Persoal RePEc Archive A costructive aalysis of covex-valued demad correspodece for weakly uiformly rotud ad mootoic preferece Yasuhito Taaka ad Atsuhiro Satoh. May 04 Olie at http://mpra.ub.ui-mueche.de/55889/

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

A Note on Matrix Rigidity

A Note on Matrix Rigidity A Note o Matrix Rigidity Joel Friedma Departmet of Computer Sciece Priceto Uiversity Priceto, NJ 08544 Jue 25, 1990 Revised October 25, 1991 Abstract I this paper we give a explicit costructio of matrices

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Central limit theorem and almost sure central limit theorem for the product of some partial sums

Central limit theorem and almost sure central limit theorem for the product of some partial sums Proc. Idia Acad. Sci. Math. Sci. Vol. 8, No. 2, May 2008, pp. 289 294. Prited i Idia Cetral it theorem ad almost sure cetral it theorem for the product of some partial sums YU MIAO College of Mathematics

More information

Asymptotic distribution of products of sums of independent random variables

Asymptotic distribution of products of sums of independent random variables Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation Chapter Output Aalysis for a Sigle Model Baks, Carso, Nelso & Nicol Discrete-Evet System Simulatio Error Estimatio If {,, } are ot statistically idepedet, the S / is a biased estimator of the true variace.

More information

Lecture 16: Achieving and Estimating the Fundamental Limit

Lecture 16: Achieving and Estimating the Fundamental Limit EE378A tatistical igal Processig Lecture 6-05/25/207 Lecture 6: Achievig ad Estimatig the Fudametal Limit Lecturer: Jiatao Jiao cribe: William Clary I this lecture, we formally defie the two distict problems

More information

5.1. The Rayleigh s quotient. Definition 49. Let A = A be a self-adjoint matrix. quotient is the function. R(x) = x,ax, for x = 0.

5.1. The Rayleigh s quotient. Definition 49. Let A = A be a self-adjoint matrix. quotient is the function. R(x) = x,ax, for x = 0. 40 RODICA D. COSTIN 5. The Rayleigh s priciple ad the i priciple for the eigevalues of a self-adjoit matrix Eigevalues of self-adjoit matrices are easy to calculate. This sectio shows how this is doe usig

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

On the convergence rates of Gladyshev s Hurst index estimator

On the convergence rates of Gladyshev s Hurst index estimator Noliear Aalysis: Modellig ad Cotrol, 2010, Vol 15, No 4, 445 450 O the covergece rates of Gladyshev s Hurst idex estimator K Kubilius 1, D Melichov 2 1 Istitute of Mathematics ad Iformatics, Vilius Uiversity

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS * M. JA]IMOVI], I. KRNI] 1.

ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS * M. JA]IMOVI], I. KRNI] 1. Yugoslav Joural of Operatios Research 1 (00), Number 1, 49-60 ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS M. JA]IMOVI], I. KRNI] Departmet of Mathematics

More information

Estimation of the essential supremum of a regression function

Estimation of the essential supremum of a regression function Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

Estimation of the Mean and the ACVF

Estimation of the Mean and the ACVF Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

More information

Approximation theorems for localized szász Mirakjan operators

Approximation theorems for localized szász Mirakjan operators Joural of Approximatio Theory 152 (2008) 125 134 www.elsevier.com/locate/jat Approximatio theorems for localized szász Miraja operators Lise Xie a,,1, Tigfa Xie b a Departmet of Mathematics, Lishui Uiversity,

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Available online at J. Math. Comput. Sci. 2 (2012), No. 3, ISSN:

Available online at   J. Math. Comput. Sci. 2 (2012), No. 3, ISSN: Available olie at http://scik.org J. Math. Comput. Sci. 2 (202, No. 3, 656-672 ISSN: 927-5307 ON PARAMETER DEPENDENT REFINEMENT OF DISCRETE JENSEN S INEQUALITY FOR OPERATOR CONVEX FUNCTIONS L. HORVÁTH,

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Probabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure

Probabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure joural of approximatio theory 84, 3140 (1996) Article No. 0003 Probabilistic ad Average Liear Widths i L -Norm with Respect to r-fold Wieer Measure V. E. Maiorov Departmet of Mathematics, Techio, Haifa,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information