Lecture 15: Density estimation

Similar documents
Lecture 33: Bootstrap

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 19: Convergence

ST5215: Advanced Statistical Theory

Lecture 8: Convergence of transformations and law of large numbers

LECTURE 8: ASYMPTOTICS I

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

2. The volume of the solid of revolution generated by revolving the area bounded by the

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Parameter, Statistic and Random Samples

Lecture 18: Sampling distributions

Estimation for Complete Data

SDS 321: Introduction to Probability and Statistics

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

7.1 Convergence of sequences of random variables

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

AMS570 Lecture Notes #2

HOMEWORK I: PREREQUISITES FROM MATH 727

This section is optional.

STA Object Data Analysis - A List of Projects. January 18, 2018

Unbiased Estimation. February 7-12, 2008

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Kernel density estimator

An Introduction to Asymptotic Theory

Lecture 3. Properties of Summary Statistics: Sampling Distribution

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Solutions of Homework 2.

Distribution of Random Samples & Limit theorems

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

Lecture 2: Monte Carlo Simulation

Lecture 11 and 12: Basic estimation theory

Notes 5 : More on the a.s. convergence of sums

Chapter 6 Sampling Distributions

Chapter 6 Principles of Data Reduction

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

7.1 Convergence of sequences of random variables

1 Convergence in Probability and the Weak Law of Large Numbers

Random Variables, Sampling and Estimation

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability and Random Processes

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Stat 421-SP2012 Interval Estimation Section

STATISTICAL METHODS FOR BUSINESS

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

CSE 527, Additional notes on MLE & EM

STAT Homework 1 - Solutions

Lecture 7: Channel coding theorem for discrete-time continuous memoryless channel

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions

In this section, we show how to use the integral test to decide whether a series

Lecture 3: Convergence of Fourier Series

Last Lecture. Biostatistics Statistical Inference Lecture 16 Evaluation of Bayes Estimator. Recap - Example. Recap - Bayes Estimator

32 estimating the cumulative distribution function

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

The Central Limit Theorem

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Solutions: Homework 3

Regression with an Evaporating Logarithmic Trend

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

4. Partial Sums and the Central Limit Theorem

Solutions to HW Assignment 1

STAT Homework 2 - Solutions

Statistical Theory; Why is the Gaussian Distribution so popular?

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Statistical Theory MT 2008 Problems 1: Solution sketches

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Mathematical Statistics - MS

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Machine Learning Theory (CS 6783)

Statistical Theory MT 2009 Problems 1: Solution sketches

Lecture 23: Minimal sufficiency

Lecture 13: Maximum Likelihood Estimation

ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1

Introduction to Probability. Ariel Yadin

Topic 9: Sampling Distributions of Estimators

Simulation. Two Rule For Inverting A Distribution Function

2.2. Central limit theorem.

Probability and Statistics

Expectation and Variance of a random variable

Maximum Likelihood Estimation and Complexity Regularization

Central Limit Theorem using Characteristic functions

Exponential Families and Bayesian Inference

Solution to Chapter 2 Analytical Exercises

FINAL EXAMINATION IN FOUNDATION OF ANALYSIS (TMA4225)

EE 4TM4: Digital Communications II Probability Theory

Stat410 Probability and Statistics II (F16)

Transcription:

Lecture 15: Desity estimatio Why do we estimate a desity? Suppose that X 1,...,X are i.i.d. radom variables from F ad that F is ukow but has a Lebesgue p.d.f. f. Estimatio of F ca be doe by estimatig f. Note that estimators of F derived i 5.1.1 ad 5.1.2 do ot have Lebesgue p.d.f. s. Havig a desity estimator f, F ca be estimated by F(x) = x f (t)dt, which may be better tha F f itself may be of iterest Differece quotiet Sice f (t) = F (t) a.e., a simple estimator of f (t) is the differece quotiet f (t) = F (t + ) F (t ), t R, 2 where F is the empirical c.d.f., ad { } is a sequece of positive costats. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 1 / 11

Lecture 15: Desity estimatio Why do we estimate a desity? Suppose that X 1,...,X are i.i.d. radom variables from F ad that F is ukow but has a Lebesgue p.d.f. f. Estimatio of F ca be doe by estimatig f. Note that estimators of F derived i 5.1.1 ad 5.1.2 do ot have Lebesgue p.d.f. s. Havig a desity estimator f, F ca be estimated by F(x) = x f (t)dt, which may be better tha F f itself may be of iterest Differece quotiet Sice f (t) = F (t) a.e., a simple estimator of f (t) is the differece quotiet f (t) = F (t + ) F (t ), t R, 2 where F is the empirical c.d.f., ad { } is a sequece of positive costats. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 1 / 11

Properties of differece quotiet Sice 2 f (t) has the biomial distributio Bi(F(t + ) F(t ),), E[f (t)] f (t) if 0 as ad Var ( f (t) ) 0 if 0 ad. Thus, we should choose covergig to 0 slower tha 1. If we assume that 0,, ad f is cotiuously differetiable at t, the it ca be show (exercise) that mse f (t)(f) = f (t) ( 1 + o 2 ) + O(λ 2 ) ad, uder the additioal coditio that λ 3 0, λ [f (t) f (t)] d N ( 0, 1 2 f (t)). UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 2 / 11

Kerel desity estimators A useful class of estimators is the class of kerel desity estimators of the form 1 ( f (t) = w t Xi λ ), where w is a kow Lebesgue p.d.f. o R ad is called the kerel. If we choose w(t) = 1 2 I [ 1,1](t), the f (t) is essetially the same as the so-called histogram. Properties of kerel desity estimator f is a Lebesgue desity o R, sice 1 ( t x f (t)dt = w ) dt = w(y)dy = 1. The bias of f (t) as a estimator of f (t) is E[ f (t)] f (t) = 1 ( w t z λ )f (z)dz f (t) = w(y)[f (t y) f (t)]dy. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 3 / 11

Kerel desity estimators A useful class of estimators is the class of kerel desity estimators of the form 1 ( f (t) = w t Xi λ ), where w is a kow Lebesgue p.d.f. o R ad is called the kerel. If we choose w(t) = 1 2 I [ 1,1](t), the f (t) is essetially the same as the so-called histogram. Properties of kerel desity estimator f is a Lebesgue desity o R, sice 1 ( t x f (t)dt = w ) dt = w(y)dy = 1. The bias of f (t) as a estimator of f (t) is E[ f (t)] f (t) = 1 ( w t z λ )f (z)dz f (t) = w(y)[f (t y) f (t)]dy. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 3 / 11

Properties of kerel desity estimator If f is bouded ad cotiuous at t, the, by the domiated covergece theorem, the bias of f (t) coverges to 0 as 0. If f is bouded ad cotiuous at t ad t w(t)dt <, the the bias of f (t) is O( ). If f is bouded ad cotiuous at t ad w 0 = [w(t)] 2 dt <, the variace of f (t) is Var ( f ) 1 (t) = λ 2 = 1 ( ( Var w t X1 [ w ( t z )) )] 2 λ 2 f (z)dz 1 [ 1 ( ] 2 w t z λ )f (z)dz = 1 [w(y)] 2 f (t y)dy + O = w ( ) 0f (t) 1 + o ( ) 1 UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 4 / 11

Properties of kerel desity estimator Hece, if 0,, ad f is bouded ad cotiuous at t, the mse f (t) (F) = w 0f (t) + O(λ 2 ). If 0,, ad f is bouded ad cotiuous at t ad w 0 = [w(t)]2 dt <, the λ { f (t) E[ f (t)]} d N ( 0,w 0 f (t) ). This ca be( show as follows. Let Y i = w t Xi ). The Y 1,...,Y are idepedet ad idetically distributed with E(Y 1 ) = w ( t x ) f (x)dx = w(y)f (t y)dy = O ( ) ad UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 5 / 11

Properties of kerel desity estimator [ ( )] t x 2 Var(Y 1 ) = w f (x)dx [ w = [w(y)] 2 f (t y)dy + O(λ 2 ) = w 0 f (t) + o( ), ( t x ) ] 2 f (x)dx sice f is bouded ad cotiuous at t ad w 0 = [w(t)]2 dt <. The Var ( f ) 1 (t) = 2 λ 2 Var(Y i ) = w ( ) 0f (t) 1 + o. λ Note that f (t) E f (t) = [Y i E(Y i )]/( ). To apply Lideberg s cetral limit theorem to f (t), we fid, for ε > 0, E(Y1 2 I { Y 1 E(Y 1 ) >ε } ) = w(y) E(Y 1 ) >ε [w(y)] 2 f (t y)dy, which coverges to 0 uder the give coditios. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 6 / 11

Properties of kerel desity estimator This proves λ { f (t) E[ f (t)]} d N ( 0,w 0 f (t) ). Furthermore, E[ f (t)] f (t) = λ 1 E(Y 1 ) f (t) = w(y)[f (t y) f (t)]dy = yw(y)f (ξ t,y, )dy, where ξ t,y, t. If f is bouded ad cotiuous at t, t w(t)dt <, ad λ 3 0, the λ {E[ f (t)] f (t)} = O ( λ ) 0 ad λ { f (t) f (t)]} d N ( 0,w 0 f (t) ). UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 7 / 11

Example 5.4 A i.i.d. sample of size = 200 was geerated from N(0,1). Desity curve estimates, differece quotiet f ad kerel estimate f, are plotted i Figure 5.1 with the curve of the true p.d.f. For the kerel estimate, w(t) = 1 2 e t is used ad = 0.4. From Figure 5.1, it seems that the kerel estimate is much better tha the differece quotiet Figure 5.1. Desity estimates i Example 5.4 True p.d.f. f(t) 0.0 0.1 0.2 0.3 0.4 0.5 Estimator (5.26) Estimator (5.29) -2-1 0 1 2 t UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 8 / 11

Noparametric regressio Assume that x i is a realizatio of a uivariate radom variable X i, ad we wat to estimate the regressio fuctio µ(t) = E(Y i t) = E(Y i X i = t) based o a radom sample (Y 1,X 1 ),...,(Y,X ) from a populatio with a pdf f (x,y). I oparametric regressio, we do ot specify ay form of µ(t) except that it is a smooth fuctio of t. A oparametric estimator of µ(t) based o a kerel w(t) is ( )/ t Xi ( ) t Xi µ(t) = Y i w, t R w From the previous discussio o the kerel estimatio of the pdf of X i, f (t), the deomiator divided by coverges i probability to f (t) if 0 ad. Hece, for the cosistecy of µ(t) as a estimator of µ(t), it suffices to show that, for ay t R, h (t) = 1 ( ) t Xi Y i w coverges i probability to yf (t,y)dy UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 9 / 11

Cosider first the expectatio: E[h (t)] = 1 [ ( )] t Xi E Y i w = 1 ( ) t x yw f (x,y)dxdy = yw (z)f (t z,y)dzdy Suppose that f (x,y) is cotiuous ad f (x,y) c(y)g(y), where g(y) is the pdf of Y i ad c(y) is a fuctio of y satisfies E[ Y i c(y i )] = y c(y)g(y)dy < The, if 0 as, by the domiated covergece theorem, lim E[h (t)] = lim yw (z)f (t z,y)dzdy = yw (z)f (t,y)dzdy = w (z)dz yf (t,y)dy = yf (t,y)dy Thus, it remais to show that the variace of h (t) coverges to 0 uder some coditios. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 10 / 11

Var(h (t)) = 1 ( ( λ 2 Var Y i w t Xi = 1 λ 2 = 1 )) 1 λ 2 [ ( )] 2 y 2 w t x f (x,y)dxdy y 2 [w (z)] 2 f (t z,y)dzdy [ ( )] 2 E Y i w t Xi Suppose that f (x,y) is cotiuous ad f (x,y) c(y)g(y), where g(y) is the pdf of Y i ad c(y) is a fuctio of y satisfies E[Yi 2 c(y i )] = y 2 c(y)g(y)dy < ad w 0 = [w(z)] 2 dz <. The lim y 2 [w (z)] 2 f (t z,y)dzdy = Hece, the variace of h (t) coverges to 0 if. = y 2 [w (z)] 2 f (t,y)dzdy [w (z)] 2 dz y 2 f (t,y)dy < Uder some more coditios, similar to the estimatio of f (t), for ay t R, we ca show that for some fuctio σ 2 (t), λ [ µ(t) µ(t)] coverges i distributio to N(0,σ 2 (t)) UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 11 / 11