Theoretical Statistics. Lecture 14.
|
|
- Jane Cross
- 5 years ago
- Views:
Transcription
1 Theoretical Statistics. Lecture 14. Peter Bartlett Metric entropy. 1. Chaining: Dudley s entropy integral 1
2 Recall: Sub-Gaussian processes Definition: A stochastic process θ X θ with indexing set T is sub- Gaussian with respect to a metric d ont if, for all θ,θ T and all λ R, ( λ Eexp(λ(X θ X θ )) 2 d(θ,θ ) 2 ) exp. 2 Lemma: [Finite Classes] ForX θ sub-gaussian wrtdont, andaaset of pairs from T, E max (θ,θ ) A (X θ X θ ) max (θ,θ ) A d(θ,θ ) 2log A. 2
3 Recall: Covering number bound Theorem: Consider a zero-mean process X θ that is sub-gaussian wrt the metric d on T. Suppose that the diameter of T is D = sup θ,θ d(θ,θ ). Then for any ǫ, Esup θ X θ 2E sup (X θ X θ )+2D logn(ǫ,t,d). d(θ,θ ) ǫ 3
4 Dudley s entropy integral Theorem: Let X θ be a zero-mean stochastic process that is sub-gaussian wrt a pseudo-metric d on the indexing set T. Then Esup θ X θ 8 2 logn(ǫ,t,d)dǫ. Note that we can always rewrite the integral as an integral from to the diameter of T. 4
5 Dudley s entropy integral: Proof As before, Esup θ X θ = Esup(X θ X θ ) Esup(X θ X θ ), θ θ,θ and choosing ˆθ ˆT (a minimal ǫ-cover) withd(ˆθ,θ) ǫ (and similarly for θ ), we have X θ X θ = X θ Xˆθ +Xˆθ Xˆθ +Xˆθ X θ 2 sup (X θ Xˆθ)+ sup d(θ,ˆθ) ǫ ˆθ,ˆθ ˆT Xˆθ Xˆθ. 5
6 Dudley s entropy integral: Proof Consider boundingesupˆθ,ˆθ (Xˆθ Xˆθ ). Previously, we bounded the supremum over theǫ-cover ˆT (for which the diameter is that of T ). Instead, we consider a sequence of progressively better approximations to elements of ˆT (which leads to sets with progressively smaller diameters). Suppose the diameter of ˆT isd. We first define ˆT k = ˆT, and think of it as a (2 k D)-cover of ˆT, where k = log 2 (D/ǫ) ensures that 2 k D ǫ. Then we define ˆT i 1 = a minimal (2 (i 1) D)-cover of ˆT i, for i going from k 1 down to. Notice that ˆT is a minimal D-cover of ˆT 1, so ˆT = 1. [PICTURE]. 6
7 Dudley s entropy integral: Proof Pick ˆθ k = ˆθ, and then pick ˆθ i 1 ˆT i 1 as the best approximation of ˆθ i. We can write ˆθ i 1 = f i 1 (ˆθ i ), where f i 1 : ˆT i ˆT i 1 is the best approximation operator. Then we can write Xˆθ = Xˆθk = Xˆθ + k i=1 and, using the same notation for ˆθ, we have (Xˆθi Xˆθi 1 ) = Xˆθ Xˆθ = Xˆθk Xˆθ k k (Xˆθi Xˆθi 1 ) k (Xˆθ i Xˆθ i 1 ). i=1 i=1 7
8 Dudley s entropy integral: Proof Thus, E sup ˆθ,ˆθ ˆT Xˆθ Xˆθ 2 k i=1 ( ) E sup X Xˆθi fi 1 (ˆθ i ). ˆθ i ˆT i Sinced(ˆθ i, ˆθ i 1 ) 2 (i 1) D, the Finite Lemma shows that ( ) E sup X Xˆθi fi 1 (ˆθ i ) 2 (i 1) D 2log ˆT i ˆθ i ˆT i 2 (i 1) D 2logN(2 i D,T). 8
9 Dudley s entropy integral: Proof Finally, since logn(2 i D) logn(u) for u 2 i D, we can approximate the area of the rectangle from (2 (i+1) D,) to (2 i D, 2logN(2 i D)) by the integral under 2logN(u) for u in that interval (which has length 2 (i+1) D): 2 (i 1) D 2logN(2 i D) = 4 2 (i+1) D 2logN(2 i D) 4 2 i D 2 (i+1) D 2logN(u,T)du. 9
10 Dudley s entropy integral: Proof Combining, we have Esup θ X θ 2E sup (X θ Xˆθ)+2 d(θ,ˆθ) ǫ 2E sup (X θ Xˆθ)+2 d(θ,ˆθ) ǫ k i=1 2E sup (X θ Xˆθ)+8 2 d(θ,ˆθ) ǫ ( ) E sup X Xˆθi fi 1 (ˆθ i ) ˆθ i ˆT i k 2 (i 1) D 2logN(2 i D,T) i=1 D/2 2 (k+1) D logn(u,t)du. Whenǫ, the first term goes to zero and (since k = log 2 (D/ǫ) ), the second term approaches the integral from tod/2, which gives the result. 1
11 Dudley s entropy integral We actually proved the following result: Theorem: Let X θ be a zero-mean stochastic process that is sub-gaussian wrt a pseudo-metric d on the indexing set T. Then Esup θ X θ 2E sup (X θ X θ )+8 D/2 2 logn(ǫ,t,d)dǫ. d(θ,θ ) δ δ/2 When the entropy integral does not exist (because N(ǫ,T,d) grows too quickly as ǫ ), this can still give a useful bound. 11
12 Dudley s entropy integral When does the entropy integral exist? SupposeT has diameter D and logn(ǫ,t,d) = O(ǫ d ). Then D logn(ǫ,t,d)dǫ C D = ǫ d/2 dǫ C 1 d/2 D1 d/2 provided that d < 2. The integral does not exist otherwise. 12
13 Entropy Integral: Lipschitz parameterized class Suppose that F is a parameterized class, F = {f(θ, ) : θ Θ}, where Θ = B 2 R p. The parameterization isl-lipschitz wrt Euclidean distance onθ, so that for all x, f(θ,x) f(θ,x) L θ θ 2. Suppose also that F = F (that is,f is closed under negations). Theorem: E R n F = O ( ) p L. n NB: We ve lost the log factor. 13
14 Entropy Integral: Lipschitz parameterized class Recall that ne R n F = E sup F F ǫ, = Esup F ǫ, = Esup ǫ,f(θ,x1), n which is sub-gaussian wrt the Euclidean distance on R n. Also, recall that N(δ,f(Θ,X n 1), 2 ) N(δ/(L n),θ, 2 ) (1+2L n/δ) p. θ 14
15 Entropy Integral: Lipschitz parameterized class Hence, E R n F 8 2 n = 8 2L n 8 p 2L n 8 p 2L n logn ( ) ǫ L n,θ, 2 dǫ logn(ǫ,θ, 2 )dǫ 2 2 log log ( 1+ 2 ) dǫ ǫ ( ) 4 dǫ. ǫ 15
16 Entropy Integral: Lipschitz parameterized class Integrating by parts, E R n F 8 p 2L n = 8 p 2L n 2 ( log ( ) 4 dǫ ǫ [4e y2 y] log2 16 ( ) 2 log2+ 2π 8.7p < L n. 4 p L n log2 e y2 dy ) 16
17 Entropy Integral: VC-class Theorem: For F a class of{, 1}-valued functions with VC-dimension d, ( ) d E R n F = O. n Compare with the consequence of Sauer s Lemma: O( dlog(n/d)/n). We lose the log factor. Note: This leads to a faster rate (without the log factor) in the proof of the Glivenko-Cantelli Theorem: ( Pr F n F c ) ) +t 2exp ( nt2. n 8 17
18 Entropy Integral: VC-class We have where E R n F 8 2 n E 8 2 n E = 8 2 n E 2 n 2 n 2 f g 2 L 2 (P n ) = 1 n logn(ǫ,f(x n 1 ), 2)dǫ logn(ǫ/ n,f, L2 (P n ))dǫ logn(ǫ,f, L2 (P n ))dǫ, n (f(x i ) g(x i )) 2. i=1 18
19 Entropy Integral: VC-class Fact (due to Haussler): N(ǫ,F, L2 (P n )) cd(16e) d ǫ 2d. E R n F n E 8 2 n E = d c n. 2 logn(ǫ,f, L2 (P n ))dǫ log(cd(16e) d ǫ 2d )dǫ 19
20 An aside: Generic Chaining Theorem: Let X θ be a zero-mean stochastic process that is sub-gaussian wrt a pseudo-metric d on the indexing set T. Then for any probability distributionµont, Esup θ X θ csup θ T log 1 µ(b(θ,ǫ)) dǫ. 2
21 An aside: Generic Chaining Talagrand s γ 2 : Theorem: ForX θ as above and γ 2 (T,d) = inf sup µ θ T log 1 µ(b(θ,ǫ)) dǫ, we have Esup θ X θ cγ 2 (T,d). 21
22 Sudakov s Lower Bound Theorem: For a zero-mean Gaussian process X θ defined ont, define the variance pseudometric d(θ,θ ) 2 = Var(X θ X θ ). Then ǫ EsupX θ sup logm(ǫ,t,d). θ ǫ> 2 22
23 Sudakov s Lower Bound Compare with the Entropy integral: Theorem: Let X θ be a zero-mean stochastic process that is sub-gaussian wrt a pseudo-metric d on the indexing set T. Then Esup θ X θ 8 2 logn(ǫ,t,d)dǫ. Suppose that Var(X θ X θ ) is on the same scale asd(θ,θ ) 2 (think of the Gaussian example of a sub-gaussian process this is precisely the variance). Then, modulo constants, the lower bound is the area of the largest rectangle that can fit under the curve (ǫ, logn(ǫ)), whereas the upper bound is the area under the curve. 23
Theoretical Statistics. Lecture 12.
Theoretical Statistics. Lecture 12. Peter Bartlett Uniform laws of large numbers: Bounding Rademacher complexity. 1. Metric entropy. 2. Canonical Rademacher and Gaussian processes 1 Recall: Covering numbers
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationTheoretical Statistics. Lecture 17.
Theoretical Statistics. Lecture 17. Peter Bartlett 1. Asymptotic normality of Z-estimators: classical conditions. 2. Asymptotic equicontinuity. 1 Recall: Delta method Theorem: Supposeφ : R k R m is differentiable
More information7 Influence Functions
7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the
More informationLecture 2: Uniform Entropy
STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal
More information21.1 Lower bounds on minimax risk for functional estimation
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationMCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham
MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationTheoretical Statistics. Lecture 23.
Theoretical Statistics. Lecture 23. Peter Bartlett 1. Recall: QMD and local asymptotic normality. [vdv7] 2. Convergence of experiments, maximum likelihood. 3. Relative efficiency of tests. [vdv14] 1 Local
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationThe Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models
The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationTheoretical Statistics. Lecture 1.
1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.
More informationUniform laws of large numbers 2
C H A P T E R 4 Uniform laws of large numbers The focus of this chapter is a class of results known as uniform laws of large numbers. 3 As suggested by their name, these results represent a strengthening
More informationMATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:
MATH 51H Section 4 October 16, 2015 1 Continuity Recall what it means for a function between metric spaces to be continuous: Definition. Let (X, d X ), (Y, d Y ) be metric spaces. A function f : X Y is
More informationProof: The coding of T (x) is the left shift of the coding of x. φ(t x) n = L if T n+1 (x) L
Lecture 24: Defn: Topological conjugacy: Given Z + d (resp, Zd ), actions T, S a topological conjugacy from T to S is a homeomorphism φ : M N s.t. φ T = S φ i.e., φ T n = S n φ for all n Z + d (resp, Zd
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationSDS : Theoretical Statistics
SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationConcentration, self-bounding functions
Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3
More informationl 1 -Regularized Linear Regression: Persistence and Oracle Inequalities
l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and
More informationPolicy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)
CS294-40 Learning for Robotics and Control Lecture 16-10/20/2008 Lecturer: Pieter Abbeel Policy Gradient Scribe: Jan Biermeyer 1 Recap Recall: H U() = E[ R(s t,a ;π ] = E[R();π ] (1) Here is a sample path
More informationEmpirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings
Available online at www.sciencedirect.com ScienceDirect Stochastic Processes and their Applications 126 (2016) 3632 3651 www.elsevier.com/locate/spa Empirical and multiplier bootstraps for suprema of empirical
More informationLecture 6: September 19
36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationNONLINEAR LEAST-SQUARES ESTIMATION 1. INTRODUCTION
NONLINEAR LEAST-SQUARES ESTIMATION DAVID POLLARD AND PETER RADCHENKO ABSTRACT. The paper uses empirical process techniques to study the asymptotics of the least-squares estimator for the fitting of a nonlinear
More informationConcentration behavior of the penalized least squares estimator
Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationTheoretical Statistics. Lecture 19.
Theoretical Statistics. Lecture 19. Peter Bartlett 1. Functional delta method. [vdv20] 2. Differentiability in normed spaces: Hadamard derivatives. [vdv20] 3. Quantile estimates. [vdv21] 1 Recall: Delta
More informationProblem set 1, Real Analysis I, Spring, 2015.
Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n
More informationEMPIRICAL PROCESSES: Theory and Applications
Corso estivo di statistica e calcolo delle probabilità EMPIRICAL PROCESSES: Theory and Applications Torgnon, 23 Corrected Version, 2 July 23; 21 August 24 Jon A. Wellner University of Washington Statistics,
More informationSUPPLEMENT TO POSTERIOR CONTRACTION AND CREDIBLE SETS FOR FILAMENTS OF REGRESSION FUNCTIONS. North Carolina State University
Submitted to the Annals of Statistics SUPPLEMENT TO POSTERIOR CONTRACTION AND CREDIBLE SETS FOR FILAMENTS OF REGRESSION FUNCTIONS By Wei Li and Subhashis Ghosal North Carolina State University The supplementary
More information4th Preparation Sheet - Solutions
Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space
More informationOnline Learning: Random Averages, Combinatorial Parameters, and Learnability
Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationConcentration inequalities: basics and some new challenges
Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,
More informationNotes on Gaussian processes and majorizing measures
Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,
More informationKernel Density Estimation
EECS 598: Statistical Learning Theory, Winter 2014 Topic 19 Kernel Density Estimation Lecturer: Clayton Scott Scribe: Yun Wei, Yanzhen Deng Disclaimer: These notes have not been subjected to the usual
More informationSolutions to Problem Set 5 for , Fall 2007
Solutions to Problem Set 5 for 18.101, Fall 2007 1 Exercise 1 Solution For the counterexample, let us consider M = (0, + ) and let us take V = on M. x Let W be the vector field on M that is identically
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationEmpirical Processes and random projections
Empirical Processes and random projections B. Klartag, S. Mendelson School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA. Institute of Advanced Studies, The Australian National
More informationMAT 257, Handout 13: December 5-7, 2011.
MAT 257, Handout 13: December 5-7, 2011. The Change of Variables Theorem. In these notes, I try to make more explicit some parts of Spivak s proof of the Change of Variable Theorem, and to supply most
More informationδ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by
. Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X
More information7 Continuous Variables
7 Continuous Variables 7.1 Distribution function With continuous variables we can again define a probability distribution but instead of specifying Pr(X j) we specify Pr(X < u) since Pr(u < X < u + δ)
More informationMAT137 - Term 2, Week 2
MAT137 - Term 2, Week 2 This lecture will assume you have watched all of the videos on the definition of the integral (but will remind you about some things). Today we re talking about: More on the definition
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations
Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationSMSTC (2017/18) Geometry and Topology 2.
SMSTC (2017/18) Geometry and Topology 2 Lecture 1: Differentiable Functions and Manifolds in R n Lecturer: Diletta Martinelli (Notes by Bernd Schroers) a wwwsmstcacuk 11 General remarks In this lecture
More informationAn Algorithmist s Toolkit Nov. 10, Lecture 17
8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from
More information19.1 Maximum Likelihood estimator and risk upper bound
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture
More informationMAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS
The Annals of Probability 2001, Vol. 29, No. 1, 411 417 MAJORIZING MEASURES WITHOUT MEASURES By Michel Talagrand URA 754 AU CNRS We give a reformulation of majorizing measures that does not involve measures,
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationTheoretical Statistics. Lecture 25.
Theoretical Statistics. Lecture 25. Peter Bartlett 1. Relative efficiency of tests [vdv14]: Rescaling rates. 2. Likelihood ratio tests [vdv15]. 1 Recall: Relative efficiency of tests Theorem: Suppose that
More information1 Review and Overview
DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last
More informationConcentration inequalities and the entropy method
Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many
More informationComputational Learning Theory - Hilary Term : Learning Real-valued Functions
Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling
More informationBrownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539
Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory
More informationThe Sherrington-Kirkpatrick model
Stat 36 Stochastic Processes on Graphs The Sherrington-Kirkpatrick model Andrea Montanari Lecture - 4/-4/00 The Sherrington-Kirkpatrick (SK) model was introduced by David Sherrington and Scott Kirkpatrick
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationHölder s and Minkowski s Inequality
Hölder s and Minkowski s Inequality James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University September 1, 218 Outline Conjugate Exponents Hölder s
More informationEMPIRICAL PROCESS THEORY AND APPLICATIONS. Sara van de Geer. Handout WS 2006 ETH Zürich
EMPIRICAL PROCESS THEORY AND APPLICATIONS by Sara van de Geer Handout WS 2006 ETH Zürich Contents Preface Introduction Law of large numbers for real-valued random variables 2 R d -valued random variables
More informationGeometric Parameters in Learning Theory
Geometric Parameters in Learning Theory S. Mendelson Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia shahar.mendelson@anu.edu.au Contents
More informationRademacher Averages and Phase Transitions in Glivenko Cantelli Classes
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 251 Rademacher Averages Phase Transitions in Glivenko Cantelli Classes Shahar Mendelson Abstract We introduce a new parameter which
More informationSampling Distributions
Sampling Distributions Mathematics 47: Lecture 9 Dan Sloughter Furman University March 16, 2006 Dan Sloughter (Furman University) Sampling Distributions March 16, 2006 1 / 10 Definition We call the probability
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationCombinatorics of random processes and sections of convex bodies
Combinatorics of random processes and sections of convex bodies M. Rudelson R. Vershynin Abstract We find a sharp combinatorial bound for the metric entropy of sets in R n and general classes of functions.
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationSome Statistical Properties of Deep Networks
Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationf (r) (a) r! (x a) r, r=0
Part 3.3 Differentiation v1 2018 Taylor Polynomials Definition 3.3.1 Taylor 1715 and Maclaurin 1742) If a is a fixed number, and f is a function whose first n derivatives exist at a then the Taylor polynomial
More informationMATH 6605: SUMMARY LECTURE NOTES
MATH 6605: SUMMARY LECTURE NOTES These notes summarize the lectures on weak convergence of stochastic processes. If you see any typos, please let me know. 1. Construction of Stochastic rocesses A stochastic
More informationMATH 423/ Note that the algebraic operations on the right hand side are vector subtraction and scalar multiplication.
MATH 423/673 1 Curves Definition: The velocity vector of a curve α : I R 3 at time t is the tangent vector to R 3 at α(t), defined by α (t) T α(t) R 3 α α(t + h) α(t) (t) := lim h 0 h Note that the algebraic
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationCOMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM
COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM A metric space (M, d) is a set M with a metric d(x, y), x, y M that has the properties d(x, y) = d(y, x), x, y M d(x, y) d(x, z) + d(z, y), x,
More informationLecture 5 - Logarithms, Slope of a Function, Derivatives
Lecture 5 - Logarithms, Slope of a Function, Derivatives 5. Logarithms Note the graph of e x This graph passes the horizontal line test, so f(x) = e x is one-to-one and therefore has an inverse function.
More informationUpper and Lower Bounds for Suprema of Chaos Processes
DEGREE PROJECT IN MATHEMATICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 Upper and Lower Bounds for Suprema of Chaos Processes TIM FUCHS KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES
More informationVapnik-Chervonenkis Dimension of Neural Nets
Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley Department of Statistics 367 Evans Hall, CA 94720-3860, USA bartlett@stat.berkeley.edu
More informationIntroduction to Algebraic and Geometric Topology Week 3
Introduction to Algebraic and Geometric Topology Week 3 Domingo Toledo University of Utah Fall 2017 Lipschitz Maps I Recall f :(X, d)! (X 0, d 0 ) is Lipschitz iff 9C > 0 such that d 0 (f (x), f (y)) apple
More informationThe Central Limit Theorem
The Central Limit Theorem Patrick Breheny September 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 31 Kerrich s experiment Introduction 10,000 coin flips Expectation and
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More informationEcon Lecture 14. Outline
Econ 204 2010 Lecture 14 Outline 1. Differential Equations and Solutions 2. Existence and Uniqueness of Solutions 3. Autonomous Differential Equations 4. Complex Exponentials 5. Linear Differential Equations
More informationSummer Jump-Start Program for Analysis, 2012 Song-Ying Li
Summer Jump-Start Program for Analysis, 01 Song-Ying Li 1 Lecture 6: Uniformly continuity and sequence of functions 1.1 Uniform Continuity Definition 1.1 Let (X, d 1 ) and (Y, d ) are metric spaces and
More informationLarge deviations for random projections of l p balls
1/32 Large deviations for random projections of l p balls Nina Gantert CRM, september 5, 2016 Goal: Understanding random projections of high-dimensional convex sets. 2/32 2/32 Outline Goal: Understanding
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationHomework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples
Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationConvergence of Feller Processes
Chapter 15 Convergence of Feller Processes This chapter looks at the convergence of sequences of Feller processes to a iting process. Section 15.1 lays some ground work concerning weak convergence of processes
More informationFourth Week: Lectures 10-12
Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore
More informationStatistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm
Statistics GIDP Ph.D. Qualifying Exam Theory Jan, 06, 9:00am-:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 5 of the 6 problems.
More informationContinuity. Chapter 4
Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of
More informationMath 201 Handout 1. Rich Schwartz. September 6, 2006
Math 21 Handout 1 Rich Schwartz September 6, 26 The purpose of this handout is to give a proof of the basic existence and uniqueness result for ordinary differential equations. This result includes the
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More information