On the estimation of the mean of a random vector
|
|
- Lucas Kristian Paul
- 6 years ago
- Views:
Transcription
1 O the estimatio of the mea of a radom vector Emilie Joly Uiversit Paris Ouest Naterre, Frace; emilie.joly@u-paris10.fr Gábor Lugosi ICREA ad Departmet of Ecoomics, Pompeu Fabra Uiversity, Barceloa, Spai; gabor.lugosi@upf.edu Roberto Imbuzeiro Oliveira IMPA, Rio de Jaeiro, RJ, Brazil; rimfo@impa.br July 8, 2016 Abstract We study the problem of estimatig the mea of a multivariate distributio based o idepedet samples. The mai result is the proof of existece of a estimator with a o-asymptotic sub-gaussia performace for all distributios satisfyig some mild momet assumptios. 1 Itroductio Let X be a radom vector takig values i R d. We assume throughout the paper that the mea vector µ = EX ad covariace matrix Σ = (X µ)(x µ) T exist. Give idepedet, idetically distributed samples X 1,..., X draw from the distributio of X, oe wishes to estimate the mea vector. A atural ad popular choice is the sample mea (1/) X i that is kow to have a ear-optimal behavior wheever the distributio is sufficietly light tailed. However, Supported by the Frech Agece Natioale de la Recherche (ANR), uder grat ANR-13-BS (project SPADRO). Supported by the Spaish Miistry of Ecoomy ad Competitiveess, Grat MTM P ad FEDER, EU. Support from CNPq, Brazil via Ciêcia sem Froteiras grat # / Supported by a Bolsa de Produtividade em Pesquisa from CNPq, Brazil. Supported by FAPESP Ceter for Neuromathematics (grat# 2013/ , FAPESP - S. Paulo Research Foudatio). 1
2 wheever heavy tails are a cocer, the sample mea is to be avoided as it may have a suboptimal performace. While the oe-dimesioal case (i.e., d = 1) is quite well uderstood (see [3], [5]), various aspects of the multidimesioal problem are still to be revealed. This paper aims at cotributig to the uderstadig of the multi-dimesioal case. Before statig the mai results, we briefly survey properties of some mea estimators of real-valued radom variables. Some of these techiques serve as basic buildig blocks for the estimators we propose for the vector-valued case. 1.1 Estimatig the mea of a real-valued radom variable Whe d = 1, the simplest ad most popular mea estimator is the sample mea µ = (1/) X i. The sample mea is ubiased ad the cetral limit theorem guaratees a asymptotically Gaussia distributio. However, uless the distributio of X has a light (e.g., sub-gaussia) tail, there are o o-asymptotic sub-gaussia performace guaratees for µ. We refer the reader to Catoi [3] for details. However, perhaps surprisigly, there exist estimators of µ with much better cocetratio properties, see Catoi [3] ad Devroye, Lerasle, Lugosi, ad Oliveira [5]. A coceptually simple ad quite powerful estimator is the so-called media-of-meas estimator that has bee proposed, i differet forms, i various papers, see Nemirovsky ad Yudi [14], Hsu [8], Jerrum, Valiat, ad Vazirai [10], Alo, Matias, ad Szegedy [1]. The media-of-meas estimator is defied as follows. Give a positive iteger b ad x 1,..., x b R, let q 1/2 deote the media of these umbers, that is, q 1/2 (x 1,..., x b ) = x i, where #{k [b] : x k x i } b 2 ad #{k [b] : x k x i } b 2. (If several i fit the above descriptio, we take the smallest oe.) For ay fixed δ [e 1 /2, 1), first choose b = l(1/δ) ad ote that b /2 holds. Next, partitio [] = {1,..., } ito b blocks B 1,..., B b, each of size B i /b 2. Give X 1,..., X, we compute the sample mea i each block Y i = 1 X j B i j B i ad defie the media-of-meas estimator by µ (δ) e.g., Hsu [8]) that for ay 4, { P µ (δ) = q 1/2 (Y 1,..., Y B ). Oe ca show (see, µ > 2e } (1 + l(1/δ)) 2Var(X) δ, (1) where Var(X) deotes the variace of X. Note that the media-of-meas estimator µ (δ) does ot require ay kowledge of the variace of X. However, it depeds o the desired cofidece level δ ad the partitio B 1,..., B b. Ay partitio satisfyig i, B i /b is valid i order to get (1). Hece, 2
3 we do ot keep the depedece o the partitio B 1,..., B b i the otatio µ (δ). Devroye, Lerasle, Lugosi, ad Oliveira [5] itroduce estimators that work for a large rage of cofidece levels uder some mild assumptios. Catoi [3] itroduces estimators of quite differet flavor ad gets a o-asymptotic result of the same form as (1). Bubeck, Cesa-Biachi ad Lugosi [2] apply these estimators i the cotext of badit problems. 1.2 Estimatig the mea of radom vectors Cosider ow the multi-dimesioal case whe d > 1. The sample mea µ = (1/) X i is still a obvious choice for estimatig the mea vector µ. If X has a multivariate ormal distributio with mea vector µ ad covariace matrix Σ, the µ is also multivariate ormal with mea µ ad covariace matrix (1/)Σ ad therefore, for δ (0, 1), with probability at least 1 δ, Tr(Σ) 2λmax log(1/δ) µ µ +, (2) where Tr(Σ) ad λ max deote the trace ad largest eigevalue of the covariace matrix, respectively (Haso ad Wright [7]). For o-gaussia ad possibly heavy-tailed distributios, oe caot expect such a sub-gaussia behavior of the sample mea. The mai goal of this paper is to ivestigate uder what coditios it is possible to defie mea estimators that reproduce a (o-asymptotic) sub-gaussia performace similar to (2). Lerasle ad Oliveira [11], Hsu ad Sabato [9], ad Misker [13] exted the media-ofmeas estimator to more geeral spaces. I particular, Misker s results imply that for each δ (0, 1) there exists a mea estimator µ (δ) ad a uiversal costat C such that, with probability at least 1 δ, µ (δ) µ C Tr(Σ) log(1/δ). (3) While this boud is quite remarkable ote that o assumptio other tha the existece of the covariace matrix is made, it does ot quite achieve a sub-gaussia performace boud that resembles (2). A istructive example is whe all eigevalues are idetical ad equal to λ max. If the dimesio d is large, (2) is of the order of (λ max /)(d + log(δ 1 )) while (3) gives the order (λ max /)(d log(δ 1 )). The mai result of this paper is the costructio of a mea estimator that, uder some mild momet assumptios, achieves a sub-gaussia performace boud i the sese of (2). More precisely, we prove the followig. Theorem 1 For all δ (0, 1) there exists a mea estimator µ (δ) ad a uiversal costat C such that if X 1,..., X are i.i.d. radom vectors i R d with mea µ R d ad covariace matrix Σ such that there exists a costat K > 0 such that, for all v R d with v = 1, E [ ((X µ) T v ) 4 ] K(v T Σv) 2, 3
4 the for all CK log d (d + log(1/δ)), ( ) Tr(Σ) µ (δ) λmax log(log d/δ) µ C +. The theorem guaratees the existece of a mea estimator whose performace matches the sub-gaussia boud (2), up to the additioal term of the order of (1/)λ max log log d for all distributios satisfyig the fourth-momet assumptio give above. The additioal term is clearly of mior importace. (For example, it is domiated by the first term wheever Tr(Σ) > λ max log log d.) With the estimator we costruct, this term is ievitable. O the other had, the iequality of the theorem oly holds for sample sizes that are at least a costat times d log d. This feature is ot desirable for truly high-dimesioal problems, especially takig ito accout that Misker s boud is dimesio-free. The fourth-momet assumptio ca be iterpreted as a boudedess assumptio of the kurtosis of (X µ) T v. The same assumptio has be used i Catoi [4] ad Giulii [6] for the robust estimatio of the Gram matrix. The fourth-momet assumptio may be weakeed to a aalogous (2 + ε)-th momet assumptio that we do ot detail for the clarity of the expositio. We prove the theorem by costructig a estimator i several steps. First we costruct a estimator that performs well for spherical distributios (i.e., for distributios whose covariace matrix has a trace comparable to dλ max ). This estimator is described i Sectio 2. I the secod step, we decompose the space i a data-depedet way ito the orthogoal sum of O(log d) subspaces such that all but oe subspaces are such that the projectio of X to the subspace has a spherical distributio. The last subspace is such that the projectio has a covariace matrix with a small trace. I each subspace we apply the first estimator ad combie them to obtai the fial estimator µ (δ). The proof below provides a explicit value of the costat C, though o attempt has bee made to optimize its value. The costructed estimator is computatioally so demadig that eve for moderate values of d it is hopeless to compute it i reasoable time. I this sese, Theorem 1 should be regarded as a existece result. It is a iterestig a importat challege to costruct estimators with similar statistical performace that ca be computed i polyomial time (as a fuctio of ad d). Note that the estimator of Misker cited above may be computed by solvig a covex optimizatio problem, makig it computatioally feasible, see also Hsu ad Sabato [9] for further computatioal cosideratios. 2 A estimator for spherical distributios I this sectio we costruct a estimator that works well wheever the distributio of X is sufficietly spherical i the sese that a positive fractio of the eigevalues of the covariace matrix is of the same order as λ max. More precisely, for c 1, we call a distributio c-spherical if dλ max ctr(σ). 4
5 For each δ (0, 1) ad uit vector w S d 1 (where S d 1 = {x R d : x = 1}), we may defie m (δ) (w) as the media-of-meas estimate (as defied i Sectio 1.1) of w T µ = Ew T X based o the i.i.d. sample w T X 1,..., w T X. Let N 1/2 S d 1 be a miimal 1/2-cover, that is, a set of smallest cardiality that has the property that for all u S d 1 there exists w N 1/2 with u x 1/2. It is well kow (see, e.g., [12, Lemma ]) that N 1/2 8 d. Notig that Var(w T X) λ max, by (1) ad the uio boud, we have that, with probability at least 1 δ, m sup (δ/8d ) (w) w T l(e8 µ 2e 2λ d /δ) max. w N 1/2 I other words, if, for λ > 0, we defie the empirical polytope { } P δ,λ = x R d m : sup (δ/8d ) (w) w T x 2e 2λ l(e8d /δ), w N 1/2 the with probability at least 1 δ, µ P δ,λmax. I particular, o this evet, P δ,λmax is oempty. Suppose that a upper boud of the largest eigevalue of the covariace matrix λ λ max is available. The we may defie the mea estimator { ay elemet y,λ = Pδ,λ if P δ,λ 0 otherwise µ (δ) Now suppose that µ P δ,λ ad let y P δ,λ be arbitrary. Defie u = (y µ)/ y µ S d 1, ad let w N 1/2 be such that w u 1/2. (Such a w exists by defiitio of N 1/2.) The y µ = u T (y µ) = (u w) T (y µ) + w T (y µ) (1/2) y µ + 4e 2λ l(e8d /δ), where we used Cauchy-Schwarz ad the fact that y, µ P δ,λ. Rearragig, we obtai that, o the evet that µ P δ,λ, µ (δ),λ µ d l 8 + l(e/δ) 8e 2λ, provided that λ λ max. Summarizig, we have proved the followig. Propositio 1 Let λ > 0 ad δ (0, 1). For ay distributio with mea µ ad covariace matrix Σ such that λ max = Σ λ, the estimator µ (δ),λ defied above satisfies, with probability at least 1 δ, µ (δ),λ µ d l 8 + l(e/δ) 8e 2λ. I particular, if the distributio is c-spherical ad λ 2λ max, the µ (δ) ctr(σ) l 8 +,λ µ λmax l(e/δ) 16e. 5.
6 The boud we obtaied has the same sub-gaussia form as (2), up to a multiplicative costat, wheever the distributio is c-spherical. To make the estimator fully datadepedet, we eed to fid a estimate λ that falls i the iterval [λ max, 2λ max ], with high probability. This may be achieved by splittig the sample i two parts of equal size (assumig is eve), estimatig λ max usig samples from oe part ad computig the mea estimate defied above usig the other part. I the ext sectio we describe such a method as a part of a more geeral procedure. 3 Empirical eigedecompositio I the previous sectio we preseted a mea estimate that works well for spherical distributios. We will use this estimator as a buildig block i the costructio of a estimator that has the desirable performace guaratee for distributios with ay covariace matrix. I additio to fiite covariaces, we assume that there exists a costat K > 0 such that, for all v R d with v = 1, E [ ((X µ) T v ) 4 ] K(v T Σv) 2. (4) I this sectio we assume that 2(400e) 2 K log 3/2 d ( d log 25 + log(2 log 3/2 d) + log(1/δ) ). The basic idea is the followig. We split the data ito two equal halves. We use the first half i order to decompose the space ito the sum of orthogoal subspaces such that the projectio of X ito each subspace is 4-spherical. The we may estimate the projected meas by the estimator of the previous sectio. Next we describe how we obtai a orthogoal decompositio of the space based o i.i.d. observatios X 1,..., X. Let s = log 3/2 d 2 ad m = /s. Divide the sample ito s blocks, each of size at least m. I what follows, we describe a way of sequetially decomposig R d ito the orthogoal sum of s + 1 subspaces R d = V 1 V s+1. First we costruct V 1 usig the first block X 1,..., X m of observatios. The we use the secod block to build V 2, ad so o, for s blocks. The key properties we eed are that (a) the radom vector X, projected to ay of these subspaces has a 4-spherical distributio; (b) the largest eigevalue of the covariace matrix of X, projected o V i is at most λ max (2/3) i 1. To this ed, just like i the previous sectio, let N γ S d 1 be a miimal γ-cover of the uit sphere S d 1 for a sufficietly small costat γ (0, 1). The value γ = 1/100 is sufficiet for our purposes ad i the sequel we assume this value. Note that N γ (4/γ) d (see [12, Lemma ] for a proof of this fact). Iitially, we use the first block X 1,..., X m. We may assume that m is eve. Usig these observatios, for each u N γ, we compute a estimate V m (δ) (u) of u T Σu = E(u T (X µ)) 2 = (1/2)E(u T (X X )) 2, where X is a i.i.d. copy of X. We may costruct the estimate by formig m/2 i.i.d. radom variables (1/2)(u T (X 1 X m/2+1 )) 2,..., (1/2)(u T (X m/2 X m )) 2 ad estimate their mea by the media-of-meas estimate V m (δ) (u) with parameter 6
7 δ/(s(4/γ) d ). The (1), together with assumptio (4) implies that, with probability at least 1 δ/s, u T Σu V m (δ) (u) K log(s(4/γ)d /δ) def. 4e = ε u T m. Σu m sup u N γ Our assumptios o the sample size guaratee that ε m < 1/100. The evet that the iequality above holds is deoted by E 1 so that P{E 1 } 1 δ/s. Let M δ,m be the set of all symmetric positive semidefiite d d matrices M satisfyig sup u N γ u T Mu V m (δ) (u) ε u T m. Σu By the argumet above, Σ M δ,m o the evet E 1. I particular, o E 1, M δ,m i oempty. Defie the estimated covariace matrix { ay elemet of Σ (δ) Mδ,m if M m = δ,m 0 otherwise Sice o E 1 both Σ (δ) m ad Σ are i M δ,m, o this evet, we have ( u T Σu ) 1 ε m 1 + ε m u T Σ(δ) m u ( u T Σu ) 1 + ε m 1 ε m for all u N γ. (5) Now compute the spectral decompositio Σ (δ) m = d λ i v i v i T, where λ 1 λ d 0 are the eigevalues ad v 1,..., v d the correspodig orthogoal eigevectors. Let u S d 1 be arbitrary ad let v be a poit i N γ with smallest distace to u. The u T Σ(δ) m u = v T Σ(δ) m v + 2(u v) T Σ(δ) m v + (u v) T Σ(δ) m (u v) v T Σ(δ) m v + λ 1 (2γ + γ 2 ) (6) (by Cauchy-Schwarz ad usig the fact that u v γ) (v T Σv) 1 + ε m + 3γ λ 1 1 ε m (by (5)) 1 + ε m λ max + 3γ λ 1. 1 ε m I particular, o E 1 we have λ 1 βλ max where β = 1+εm 1 ε m /(1 3γ) <
8 By a similar argumet, we have that for ay u S d 1, if v is the poit i N γ with smallest distace to u, the o E 1, u T Σu (v T Σ(δ) m v) 1 + ε m 1 ε m + 3γλ max 1 + ε m 1 ε m λ1 + 3γλ max. I particular, λ max β λ 1 (4/3) λ 1. Similarly, u T Σu (v T Σ(δ) m v) 1 ε m 3γ λ ε m ( ) 1 u T εm Σ(δ) m u 3γ λ 1 3γ λ ε m ( ) 1 u T εm Σ(δ) m u 6γ λ 1. (7) 1 + ε m Let d 1 be umber of eigevalues λ i that are at least λ 1 /2 ad let V 1 be the subspace of R d spaed by v 1,..., v d1. Deote by Π 1 (X) the orthogoal projectio of the radom variable X (idepedet of the X i used to build V 1 ) oto V 1. The for ay u V 1 S d 1, o the evet E 1, by (7), ad therefore ( ) u T Σu λ 1 1 εm 1 12γ λ ε m 3 Eu T (Π 1 (X) EΠ 1 (X))(Π 1 (X) EΠ 1 (X)) T u = u T Σu ) ( λ1 3, 4 λ 1 3 I particular, the ratio of the largest ad smallest eigevalues of the covariace matrix of Π 1 (X) is at most 4 ad therefore the distributio of Π 1 (X) is 4-spherical. O the other had, o the evet E 1, for ay uit vector u V1 S d 1 i the orthogoal complemet of V 1, we have u T Σu 2λ max /3. To see this, ote that u T Σ(δ) m u λ 1 /2 ad therefore, deotig by v the poit i N γ closest to u, ( u T Σu = u T Σ(δ) m u + v T Σ ) ( ) (δ) Σ m v + v T Σ(δ) m v u T Σ(δ) m u + ( u T Σu v T Σv ) λ ε mλ max + 3γ λ 1 + 3γλ max (by (5), (6), ad a similar argumet for the last term) λ max (β ( γ ) ) + 2ε m + 3γ 2λ max 3 I other words, the largest eigevalue of the covariace matrix of Π 1 (X) (the projectio of X to the subspace V 1 ) is at most (2/3)λ max. 8..
9 I the ext step we costruct the subspace V 2 V1. To this ed, we proceed exactly as i the first step but ow we replace R d by V1 ad the sample X 1,..., X m o the first block by the variables Π 1 (X m+1 ),..., Π 1 (X 2m ) V1. (Recall that Π 1 (X i ) is the projectio of X i to the subspace V1 ). Just like i the first step, with probability at least 1 δ/s we obtai a (possibly empty) subspace V 2, orthogoal to V 1 such that Π 2 (X), the projectio of X o V 2, has a 4-spherical distributio ad largest eigevalue of the covariace matrix of Π 2 (X) (the projectio of X to the subspace (V 1 V 2 ) ) is at most (2/3) 2 λ max. We repeat the procedure s times ad use a uio boud the s evets. We obtai, with probability at least 1 δ, a sequece of subspaces V 1,..., V s, with the followig properties: (i) V 1,..., V s are orthogoal subspaces. (ii) For each i = 1,..., s, Π i (X), the projectio of X o V i, has a 4-spherical distributio. (iii) The largest eigevalue of the covariace matrix of Π i (X) is at most λ (i) 1 (2/3) i 1 λ max. (iv) The largest eigevalue λ (i) 1 of the estimated covariace matrix of Π i (X) satisfies (3/4)λ (i) (i) 1 λ 1 1.1λ (i) 1. Note that it may happe for some T < s, we have R d = V 1 V T. I that case we defie V T +1 = = V s =. 4 Puttig it all together I this sectio we costruct our fial multivariate mea estimator ad prove Theorem 1. To simplify otatio, we assume that the sample size is 2. This oly effects the value of the uiversal costat C i the statemet of the theorem. The data is split ito two equal halves (X 1,..., X ) ad (X +1,..., X 2 ). The secod half is used to costruct the orthogoal spaces V 1,..., V s as described i the previous sectio. Let d 1,..., d s deote the dimesio of these subspaces. Recall that, with probability at least 1 δ, the costructio is successful i the sese that the subspaces satisfy properties (i) (iv) described at the ed of the previous sectio. Deote this evet by E. I the rest of the argumet we coditio o (X +1,..., X 2 ) ad assume that E occurs. All probabilities below are coditioal. If s d i < d (i.e., V 1 V s R d ), the we defie V s+1 = (V 1 V s ) ad deote by d s+1 = d s d i the dimesio of V s+1. Let Π 1,..., Π s+1 deote the projectio operators o the subspaces V 1,..., V s+1, respectively. For each i = 1,..., s + 1, we use the vectors Π i (X 1 ),..., Π i (X ) to compute a estimator of the mea E [Π i (X) (X +1,..., X 2 )] = Π i (µ). For i = 1,..., s, we use the estimator defied i Sectio 2. I particular, withi the d i -dimesioal space V i, we compute µ i = µ (δ/(s+1)). Note that sice λ,(4/3) λ i comes from i a empirical estimatio of Σ restricted to a empirical subspace V i, µ i is a estimator 9
10 costructed o the sample X 1,..., X. The, by Propositio 1, with probability 1 δ/(s+ 1), ( (i) (8/3) λ µ i Π i (µ) 2 1 di l 8 + l(e(2 log 3/2 d + 1)/δ)) 2 (8e). I the last subspace V s+1, we may use Misker s estimator, based o Π s+1 (X 1 ),..., Π s+1 (X ) to compute a estimator µ s+1 = µ (δ/(s+1)) of Π s+1 (µ). Sice the largest eigevalue of the covariace matrix of Π s+1 (X) is at most λ max /d 2, usig (3), we obtai that, with probability 1 δ/(s + 1), µ s+1 Π s+1 (µ) 2 C λ max log((2 log 3/2 d + 1)/δ). Our fial estimator is µ (δ) = s+1 µ s+1. By the uio boud, we have that, with probability at least 1 δ, µ (δ) µ 2 = s+1 µ i Π i (µ) 2 2 (8/3) l 8 (8e) λ (i) 1 d i + (8e) 2 (8/3) l(e(2 log 3/2 d + 1)/δ) +C λ max log((2 log 3/2 d + 1)/δ) First otice that, by properties (iii) ad (iv) at the ed of the previous sectio, O the other had, sice λ (i) λ (i) 1 1.1λ max (2/3) i 1 3.3λ max. s+1 Tr(Σ) = E X µ 2 = E Π i (X) Π i (µ) 2 ad for i s each Π i (X) has a 4-spherical distributio, we have that λ (i) 1 d i 1.1 This cocludes the proof of Theorem 1. Refereces λ (i) 1 d i 4.4Tr(Σ). [1] N. Alo, Y. Matias, ad M. Szegedy. The space complexity of approximatig the frequecy momets. Joural of Computer ad System Scieces, 58: , λ (i) 1
11 [2] S. Bubeck, N. Cesa-Biachi, ad G. Lugosi. Badits with heavy tail. IEEE Trasactios o Iformatio Theory, 59: , [3] O. Catoi. Challegig the empirical mea ad empirical variace: a deviatio study. Aales de l Istitut Heri Poicaré, Probabilités et Statistiques, 48(4): , [4] O. Catoi. Pac-bayesia bouds for the gram matrix ad least squares regressio with a radom desig. arxiv preprit arxiv: , [5] L. Devroye, M. Lerasle, G. Lugosi, ad R.I. Oliveira. Sub-Gausssia mea estimators. Aals of Statistics, [6] I. Giulii. Robust dimesio-free gram operator estimates. arxiv preprit arxiv: , [7] D.L. Haso ad F.T. Wright. A boud o tail probabilities for quadratic forms i idepedet radom variables. Aals of Mathematical Statistics, 42: , [8] D. Hsu. Robust statistics [9] D. Hsu ad S. Sabato. Loss miimizatio ad parameter estimatio with heavy tails. Joural of Machie Learig Research, 17:1 40, [10] M. Jerrum, L. Valiat, ad V. Vazirai. Radom geeratio of combiatorial structures from a uiform distributio. Theoretical Computer Sciece, 43: , [11] M. Lerasle ad R. I. Oliveira. Robust empirical mea estimators. arxiv: , [12] J. Matoušek. Lectures o discrete geometry. Spriger, [13] S. Misker. Geometric media ad robust estimatio i Baach spaces. Beroulli, 21: , [14] A.S. Nemirovsky ad D.B. Yudi. Problem complexity ad method efficiecy i optimizatio
Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationarxiv: v1 [math.st] 17 Apr 2015
Robust estimatio of U-statistics arxiv:1504.04580v1 [math.st] 17 Apr 2015 Emilie Joly Gábor Lugosi April 20, 2015 This paper is dedicated to the memory of Evarist Gié. Abstract A importat part of the legacy
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationThe random version of Dvoretzky s theorem in l n
The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationNotes 27 : Brownian motion: path properties
Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationSub-Gaussian mean estimators
Sub-Gaussia mea estimators Luc Devroye Matthieu Lerasle Gábor Lugosi Roberto I. Oliveira September 19, 2015 Abstract We discuss the possibilities ad limitatios of estimatig the mea of a real-valued radom
More informationTHE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES
COMMUN. STATIST.-STOCHASTIC MODELS, 0(3), 525-532 (994) THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES Jack W. Silverstei Departmet of Mathematics, Box 8205 North Carolia
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationJournal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula
Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationAn almost sure invariance principle for trimmed sums of random vectors
Proc. Idia Acad. Sci. Math. Sci. Vol. 20, No. 5, November 200, pp. 6 68. Idia Academy of Scieces A almost sure ivariace priciple for trimmed sums of radom vectors KE-ANG FU School of Statistics ad Mathematics,
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationRandom Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices
Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationRiesz-Fischer Sequences and Lower Frame Bounds
Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationA constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference
MPRA Muich Persoal RePEc Archive A costructive aalysis of covex-valued demad correspodece for weakly uiformly rotud ad mootoic preferece Yasuhito Taaka ad Atsuhiro Satoh. May 04 Olie at http://mpra.ub.ui-mueche.de/55889/
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationA Note on Matrix Rigidity
A Note o Matrix Rigidity Joel Friedma Departmet of Computer Sciece Priceto Uiversity Priceto, NJ 08544 Jue 25, 1990 Revised October 25, 1991 Abstract I this paper we give a explicit costructio of matrices
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationCentral limit theorem and almost sure central limit theorem for the product of some partial sums
Proc. Idia Acad. Sci. Math. Sci. Vol. 8, No. 2, May 2008, pp. 289 294. Prited i Idia Cetral it theorem ad almost sure cetral it theorem for the product of some partial sums YU MIAO College of Mathematics
More informationAsymptotic distribution of products of sums of independent random variables
Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationIt should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.
Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig
More informationHomework Set #3 - Solutions
EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm
More informationChapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation
Chapter Output Aalysis for a Sigle Model Baks, Carso, Nelso & Nicol Discrete-Evet System Simulatio Error Estimatio If {,, } are ot statistically idepedet, the S / is a biased estimator of the true variace.
More informationLecture 16: Achieving and Estimating the Fundamental Limit
EE378A tatistical igal Processig Lecture 6-05/25/207 Lecture 6: Achievig ad Estimatig the Fudametal Limit Lecturer: Jiatao Jiao cribe: William Clary I this lecture, we formally defie the two distict problems
More information5.1. The Rayleigh s quotient. Definition 49. Let A = A be a self-adjoint matrix. quotient is the function. R(x) = x,ax, for x = 0.
40 RODICA D. COSTIN 5. The Rayleigh s priciple ad the i priciple for the eigevalues of a self-adjoit matrix Eigevalues of self-adjoit matrices are easy to calculate. This sectio shows how this is doe usig
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationOn the convergence rates of Gladyshev s Hurst index estimator
Noliear Aalysis: Modellig ad Cotrol, 2010, Vol 15, No 4, 445 450 O the covergece rates of Gladyshev s Hurst idex estimator K Kubilius 1, D Melichov 2 1 Istitute of Mathematics ad Iformatics, Vilius Uiversity
More informationA RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS
J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationMath 525: Lecture 5. January 18, 2018
Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the
More informationON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS * M. JA]IMOVI], I. KRNI] 1.
Yugoslav Joural of Operatios Research 1 (00), Number 1, 49-60 ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS M. JA]IMOVI], I. KRNI] Departmet of Mathematics
More informationEstimation of the essential supremum of a regression function
Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,
More informationSTAT Homework 1 - Solutions
STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better
More informationEstimation of the Mean and the ACVF
Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators
More informationApproximation theorems for localized szász Mirakjan operators
Joural of Approximatio Theory 152 (2008) 125 134 www.elsevier.com/locate/jat Approximatio theorems for localized szász Miraja operators Lise Xie a,,1, Tigfa Xie b a Departmet of Mathematics, Lishui Uiversity,
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationAvailable online at J. Math. Comput. Sci. 2 (2012), No. 3, ISSN:
Available olie at http://scik.org J. Math. Comput. Sci. 2 (202, No. 3, 656-672 ISSN: 927-5307 ON PARAMETER DEPENDENT REFINEMENT OF DISCRETE JENSEN S INEQUALITY FOR OPERATOR CONVEX FUNCTIONS L. HORVÁTH,
More informationAccuracy Assessment for High-Dimensional Linear Regression
Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia
More informationSupplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting
Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationLaw of the sum of Bernoulli random variables
Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible
More informationDisjoint Systems. Abstract
Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationProbabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure
joural of approximatio theory 84, 3140 (1996) Article No. 0003 Probabilistic ad Average Liear Widths i L -Norm with Respect to r-fold Wieer Measure V. E. Maiorov Departmet of Mathematics, Techio, Haifa,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationIntroduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT
Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationThis section is optional.
4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore
More informationA New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem
This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationDouble Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution
Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More information