ECE 901 Lecture 4: Estimation of Lipschitz smooth functions

Similar documents
A Simple Regression Problem

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

A PROBABILITY PROBLEM

Lecture 19. Curve fitting I. 1 Introduction. 2 Fitting a constant to measured data

We have also learned that, thanks to the Central Limit Theorem and the Law of Large Numbers,

Integrals of Functions of Several Variables

Sequences and Series of Functions

10/ Statistical Machine Learning Homework #1 Solutions

Jacobi symbols. p 1. Note: The Jacobi symbol does not necessarily distinguish between quadratic residues and nonresidues. That is, we could have ( a

Lecture 10: Bounded Linear Operators and Orthogonality in Hilbert Spaces

Statistics for Applications Fall Problem Set 7

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Data Analysis and Statistical Methods Statistics 651

Define a Markov chain on {1,..., 6} with transition probability matrix P =

Contents Two Sample t Tests Two Sample t Tests

Math 4707 Spring 2018 (Darij Grinberg): midterm 2 page 1. Math 4707 Spring 2018 (Darij Grinberg): midterm 2 with solutions [preliminary version]

Bertrand s postulate Chapter 2

Estimation for Complete Data

Math 113, Calculus II Winter 2007 Final Exam Solutions

Chapter 6 Infinite Series

Discrete Mathematics: Lectures 8 and 9 Principle of Inclusion and Exclusion Instructor: Arijit Bishnu Date: August 11 and 13, 2009

Exponential Functions and Taylor Series

Statistics and Data Analysis in MATLAB Kendrick Kay, February 28, Lecture 4: Model fitting

The Hypergeometric Coupon Collection Problem and its Dual

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Chapter 2. Asymptotic Notation

Notes 19 : Martingale CLT

Stanford Statistics 311/Electrical Engineering 377

Lecture 11 and 12: Basic estimation theory

Week 10 Spring Lecture 19. Estimation of Large Covariance Matrices: Upper bound Observe. is contained in the following parameter space,

Approximations and more PMFs and PDFs

5.6 Binomial Multi-section Matching Transformer

Probability Theory. Exercise Sheet 4. ETH Zurich HS 2017

Lesson 10: Limits and Continuity

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

Infinite Sequences and Series

6.3 Testing Series With Positive Terms

Regression with quadratic loss

COMP 2804 Solutions Assignment 1

Exponential Functions and Taylor Series

x !1! + 1!2!

Problem Set 2 Solutions

A string of not-so-obvious statements about correlation in the data. (This refers to the mechanical calculation of correlation in the data.

Seunghee Ye Ma 8: Week 5 Oct 28

Convergence of random variables. (telegram style notes) P.J.C. Spreij

PRACTICE FINAL/STUDY GUIDE SOLUTIONS

REGRESSION WITH QUADRATIC LOSS

(s)h(s) = K( s + 8 ) = 5 and one finite zero is located at z 1

Lecture 8: Convergence of transformations and law of large numbers

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Optimally Sparse SVMs

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Bernoulli Polynomials Talks given at LSBU, October and November 2015 Tony Forbes

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 19: Convergence

Lecture 6: Integration and the Mean Value Theorem. slope =

The Growth of Functions. Theoretical Supplement

The Binomial Multi-Section Transformer

5.6 Binomial Multi-section Matching Transformer

MA131 - Analysis 1. Workbook 2 Sequences I

Math 128A: Homework 1 Solutions

Math 216A Notes, Week 5

Sequences I. Chapter Introduction

Maximum Likelihood Estimation and Complexity Regularization

Analytic Continuation

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

4. Partial Sums and the Central Limit Theorem

MATH4822E FOURIER ANALYSIS AND ITS APPLICATIONS

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

ST5215: Advanced Statistical Theory

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Advanced Stochastic Processes.

Random Variables, Sampling and Estimation

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

ECE Spring Prof. David R. Jackson ECE Dept. Notes 20

On Modeling On Minimum Description Length Modeling. M-closed

Introduction to Computational Biology Homework 2 Solution

Intro to Learning Theory

Lecture 3: August 31

1 Review and Overview

SECTION 2.6 THE SECOND ALTERNATIVE

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Lecture Chapter 6: Convergence of Random Sequences

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Disjoint set (Union-Find)

19.1 The dictionary problem

Math 341 Lecture #31 6.5: Power Series

Power series are analytic

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

REVIEW OF CALCULUS Herman J. Bierens Pennsylvania State University (January 28, 2004) x 2., or x 1. x j. ' ' n i'1 x i well.,y 2

Lecture 15: Density estimation

ECE534, Spring 2018: Final Exam

f(1), and so, if f is continuous, f(x) = f(1)x.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Binomial transform of products

Lecture 18: Sampling distributions

Transcription:

ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig Fially let f : [, ] R be a fuctio satisfyig E[W ] ad E[W 2 ] σ 2 <. f (t) f (s) L t s, t, s [, ], () where L > is a costat. A fuctio satisfyig coditio () is said to be Lipschitz o [, ]. Notice that such a fuctio ust be cotiuous, but it is ot ecessarily differetiable. A exaple of such a fuctio is depicted i Figure (a)..5.4.2.5.8.6.4.2.2.4.6.8 (a).2.4.6.8 (b) Figure : Exaple of a Lipschitz fuctio, ad our observatios settig. (a) rado saplig of f, the poits correspod to (X i, Y i ), i,..., ; (b) deteriistic saplig of f, the poits correspod to (i/, Y i ), i,...,. Note that E[Y X x] E[f (X) + W X x] E[f (x) + W X x] f (x) + E[W ] f (x). Cosider our usual setup: Estiate f usig traiig exaples {X i, Y i } i.i.d. i P XY, Y i f (X i ) + W i, i {,..., },

where i.i.d. eas idepedetly ad idetically distributed. Figure (a) illustrates this setup. For siplicity we will cosider a slightly differet settig. I ay applicatios we ca saple X [, ] as we like, ad ot ecessarily at rado. For exaple we ca take saples uiforly spaced o [, ] x i i, i,...,, Y i f (x i ) + W i ( ) i f + W i. We will proceed with this setup (as i Figure (b)) i the rest of the lecture. Our goal is to fid f such that E[ f f 2 ], as (here is the usual L 2 -or; i.e., f f 2 f (t) f (t) 2 dt). Let F {f : f is Lipschitz with costat L}. The Risk is defied as R(f) f f 2 f (t) f(t) 2 dt. The Expected Risk (recall that our estiator f is based o {x i, Y i } ad hece is a r.v.) is defied as Fially the Epirical Risk is defied as E[R( f )] E[ f f 2 ]. R (f) i ( f ( ) ) 2 i Y i. For the estiatio task we will use stair fuctios. Let N ad defie the class of piecewise costat fuctios F f : f(t) c j { j t< j }, c j R. F is the space of fuctios that are costat o itervals [ j,, j ), j,...,. Clearly if is rather large we ca approxiate alost ay bouded fuctio arbitrarily well. So it ake soe sese to use these classes to costruct a set of sieves. Let < 2 3 be a sequece of itegers satisfyig as. That is, for each value of there is a associated iteger value. Defie the Sieve F, F 2, F 3,..., F f : f(t) c j {t Ij,}, c j R. Fro here o we will use istead of ad istead of, for otatioal ease. Defie f(t) F to be a approxiatio of f, i particular f(t) c j {t Ij}, where c j ( ) i, N j i: i Ij f Where N j {i {,..., } : i }. Let be the uber of eleets of N j, ad assue is ot too large relative to so that >. I fact N j so as log as grows slightly slower tha we are okay. 2

Exercise Upper boud the error of approxiatio of f f 2. f f 2 f (t) f(t) 2 dt f (t) f(t) 2 dt f (t) c j 2 dt 2 f (t) ( i f dt ) i: i Ij ( ( )) 2 i f (t) f dt 2 ( ) i f (t) f dt 2 L dt dt. The above iplies that f f 2 as, sice as. I words, with sufficietly large we ca approxiate f to arbitrary accuracy usig odels i F (eve if the fuctios we are usig to approxiate f are ot Lipschitz!). Of course we caot copute f without kowig f, so let s use the data to fid a good odel i F. For ay f F, f c j {t Ij}, we have 2 R (f) c j {t Ij} Y i i (c j Y i ) 2. i: i Ij Let f arg i f F R (f). The f (t) ĉ j {t Ij}, where ĉ j Y i (2) N j i: i Ij 3

Exercise 2 Show (2). Note that E[ĉ j ] c j ad therefore E[ f (t)] f(t). Lets aalyze ow the expected risk of f : E[ f f 2 ] E[ f f + f f 2 ] f f 2 + E[ f f 2 ] + 2E[ f f, f f ] f f 2 + E[ f f 2 ] + 2 f f, E[ f f ] f f 2 + E[ f f 2 ], (3) where the fial step follows fro the fact that E[ f (t)] f(t). A couple of iportat rearks pertaiig the right-had-side of equatio (3): The first ter, f f 2, correspods to the approxiatio error, ad idicates how well ca we approxiate the fuctio f with a fuctio fro F. Clearly, the larger the class F is, the sallest we ca ake this ter. This ter is precisely the squared bias of the estiator f. The secod ter, E[ f f 2 ], is the estiatio error, the variace of our estiator. We will see that the estiatio error is sall if the class of possible estiators F is also sall. The behavior of the first ter i (3) was already studied. Cosider the other ter: [ ] E[ f f 2 ] E f(t) f (t) 2 dt E ( c j ĉ j ) 2 {t Ij}dt E ( c j ĉ j ) 2 dt for ay ɛ > provided / is large eough. Cobiig all the facts derived we have E[ f f 2 ] E [ ( c j ĉ j ) 2] 2 E (f (i/) Y i ) dt 2 E (W i ) dt σ 2 σ 2 / σ 2 / σ2 ( + ɛ)σ2, L2 2 + σ2 O ( { ax 2, }). (4) The otatio x O(y ) (that reads x is big-o y, or x is of the order of y as goes to ifiity ) eas that x Cy, where C is a positive costat ad y is a o-egative sequece. 4

What is the best choice of? If is sall the the approxiatio error (i.e., O(/ 2 )) is goig to be large, but the estiatio error (i.e., O(/)) is goig to be sall, ad vice-versa. This two coflictig goals provide a tradeoff that directs our choice of (as a fuctio of ). I Figure 2 we depict this tradeoff. I Figure 2(a) we cosidered a large value, ad we see that the approxiatio of f by a fuctio i the class F ca be very accurate (that is, our estiate will have a sall bias), but whe we use the easured data our estiate looks very bad (high variace). O the other had, as illustrated i Figure 2(b), usig a very sall allows our estiator to get very close to the best approxiatig fuctio i the class F, so we have a low variace estiator, but the bias of our estiator (i.e., the differece betwee f ad f ) is quite cosiderable..4.2.8.6.4.2.2.4.6.8 (a).4.2.8.6.4.2.2.4.6.8 (b) Figure 2: Approxiatio ad estiatio of f (i blue) for 6. The fuctio f is depicted i gree ad the fuctio f is depicted i red. I (a)we have 6 ad i (b) we have 6. We eed to balace the two ters i the right-had-side of (4) i order to axiize the rate of decay (with ) of the expected risk. This iplies that 2 therefore /3 ad the Mea Squared Error (MSE) is E[ f f 2 ] O( 2/3 ). So the sieve F, F 2, with /3 produces a F-cosistet estiator for f F. It is iterestig to ote that the rate of decay of the MSE we obtai with this strategy caot be further iproved by usig ore sophisticated estiatio techiques (that is, 2/3 is the iiax MSE rate for this proble). Also, rather surprisigly, we are cosiderig classes of odels F that are actually ot Lipschitz, therefore our estiator of f is ot a Lipschitz fuctio, ulike f itself. 5