Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Similar documents
Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Nonparametric Density Estimation (Multidimension)

Nonparametric Density Estimation

Kernel density estimation for heavy-tailed distributions...

Adaptive Nonparametric Density Estimators

Kernel density estimation

Nonparametric Methods

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Density and Distribution Estimation

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Nonparametric Density Estimation

Kernel Density Estimation

Analysis methods of heavy-tailed data

Nonparametric Estimation of Luminosity Functions

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch

Lecture 3: Statistical Decision Theory (Part II)

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

probability of k samples out of J fall in R.

Non-parametric Inference and Resampling

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

4 Nonparametric Regression

Time Series and Forecasting Lecture 4 NonLinear Time Series

Chapter 1. Density Estimation

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Nonparametric Density Estimation. October 1, 2018

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

I [Xi t] n ˆFn (t) Binom(n, F (t))

Logistic Kernel Estimator and Bandwidth Selection. for Density Function

3 Nonparametric Density Estimation

Log-Density Estimation with Application to Approximate Likelihood Inference

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Introduction to Regression

Smooth simultaneous confidence bands for cumulative distribution functions

A Novel Nonparametric Density Estimator

MIT Spring 2015

12 - Nonparametric Density Estimation

Nonparametric Econometrics

Rewriting Absolute Value Functions as Piece-wise Defined Functions

Right-truncated data. STAT474/STAT574 February 7, / 44

Additive Isotonic Regression

Semiparametric Regression Based on Multiple Sources

Motivational Example

Chapter 9. Non-Parametric Density Function Estimation

Nonparametric Function Estimation with Infinite-Order Kernels

Econ 582 Nonparametric Regression

Chapter 9. Non-Parametric Density Function Estimation

Nonparametric Statistics

Semiparametric Regression Based on Multiple Sources

41903: Introduction to Nonparametrics

Introduction to Regression

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

A PROBABILITY DENSITY FUNCTION ESTIMATION USING F-TRANSFORM

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Open Access A Stat istical Model for Wind Power Forecast Error Based on Kernel Density

Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs

Density Estimation. We are concerned more here with the non-parametric case (see Roger Barlow s lectures for parametric statistics)

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Test for Discontinuities in Nonparametric Regression

Positive data kernel density estimation via the logkde package for R

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Confidence intervals for kernel density estimation

STAT 830 Non-parametric Inference Basics

Section 4.3: Continuous Data Histograms

Continuous Probability Distributions. Uniform Distribution

On the Inverse Gaussian Kernel Estimator of the Hazard Rate Function

Discussion Paper No. 28

Learning Objectives for Stat 225

More on Estimation. Maximum Likelihood Estimation.

Asymptotically Optimal Regression Trees

Fuzzy histograms and density estimation

Bickel Rosenblatt test

Introduction to Regression

Data-Based Choice of Histogram Bin Width. M. P. Wand. Australian Graduate School of Management. University of New South Wales.

ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS

Exploiting k-nearest Neighbor Information with Many Data

Computer Emulation With Density Estimation

Density Estimation (II)

On variable bandwidth kernel density estimation

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics 135 Fall 2007 Midterm Exam

The Priestley-Chao Estimator - Bias, Variance and Mean-Square Error

Math 494: Mathematical Statistics

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Local Polynomial Regression

Nonparametric Methods

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Practice Problems Section Problems

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Additive Models: Extensions and Related Models.

Introduction to Curve Estimation

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

STAT 6350 Analysis of Lifetime Data. Probability Plotting

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Statistica Sinica Preprint No: SS

Transcription:

Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann

Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density and display it graphically. Construction: Divide the range into bins B j = [x 0 + (j 1)h, x 0 + jh), j Z with origin x 0 and binwidth h. Count the observations in each B j (=: n j ) Normalize to 1: f j = n j nh (relative frequencies, divided by h). Draw bars with height f j for bin B j.

Formula Formula of the histogram: ˆf h (x) = 1 nh n 1 (Xi B j )1 (x Bj ) i=1 Note: Denote by m j the center of the bin B j. The histogram assigns each x in B j = [m j h 2, m j + h 2 ) the same estimate, ˆf h (m j ) for f. j

Derivation Motivation of the histogram: The probability of an observation X will fall into the bin B j = [m j h 2, m j + h 2 ) is P(X B j ) = f (u)du B j f (m j ) h Approximate by the relative frequency of observations in the interval: P(X B j ) 1 n #{X i B j } Combining this, we get ˆf h (m j ) = 1 nh #{X i B j }

Binwidth The histogram ˆf h (m j ) depends on the binwidth h and the origin x 0. The effect of the choice of binwidth is displayed in the four histograms:

Statistical properties (Asymptotic) Statistical properties of the histogram as an estimator of the unknown density. Let X 1,..., X n f. We have Consistency: ˆf h (x) = 1 nh n 1 (Xi B j )1 (x Bj ) i=1 Is ˆf h (x) a consistent estimator of f (x), ie. ˆf h (x) j P f (x)? Suppose the origin x 0 = 0. We want to estimate the density at x B j = [(j 1)h, jh) ˆf h (x) = 1 nh n i=1 1 (Xi B j )

Bias and Variance Bias E[ˆf h (x) f (x)] f (m j ) (m j x) Note: The bias is increasing in the slope of f (m j ) and the bias is 0 if x = m j. Variance V[ˆf h (x)] 1 nh f (x) Note: The variance is proportional to f (x) and decreases when nh increases. Bias increases when h increases and variance decreases when h increases. i.e. we have to find a compromise between bias and variance to find an optimal h.

Mean Square Error (MSE) Mean Square Error MSE[ˆf h (x)] = E[ˆf h (x) f (x)] 2 = Variance + Bias 2 (general result) 1 nh f (x) + [ f (m j ) ] 2 (mj x) 2 Note: The histogram converges in mean square to f(x) if h 0 and nh. That means more and more observations and smaller and smaller binwidth, but not too fast. Convergence in mean square implies convergence i probability: ˆf h (x) is a consistent estimator of f (x).

Bias, variance and MSE for a histogram Squared bias: Thin solid line. Variance: Dashed line. MSE: Thick line.

Mean Integrated Squared Error (MISE) MSE measures the accuracy of ˆf h (x) as an estimator of f in a single point. But we want a global quality measure: MISE [ ] 2 MISE(ˆf h ) = E (ˆf h (x) f (x)) dx [ ) ] 2 = E (ˆf h (x) f (x) dx =. where f 2 2 = f (x) 2 dx ] MSE [ˆfh (x) dx 1 nh + h2 12 f 2 2 = AMISE(ˆf h )

Optimal Binwidth Criterion for selecting an optimal binwidth h: Select h that minimizes AMISE. Hence AMISE(ˆf h ) h = 1 nh 2 + 1 6 h f 2 2 = 0 ( ) 6 1/3 h 0 = n f 2 n 1/3 2

Rule-of-thumb binwidth Problem: f is unknown, so we cannot calculate f 2 2!!! Solution: Assume that f follows a special distribution, ex. standard normal distribution, then: f 2 2 = 1 4 π Therefore we get a rule-of-thumb binwidth: ( ) 1/3 6 h 0 = n 1 3.5n 1/3 4 π

Origin The histogram depends on the origin

Drawbacks of the histogram Constant over interval (step function) Results depend on origin Binwidth choice Slow rate of convergence. Solution to the dependence on the origin x 0 : Averaged Shifted (ASH)

Averaged shifted histogram (idea) ASH is obtained by averaging over histograms correspondig to different origins. It seems to correspond to a smaller binwidth than the histogram from which it is constructed. But it is not an ordinary histogram with smaller binwidth.

Averaged shifted histogram with origin x 0 = 0, and bins B j = [(j 1)h, jh), j Z Generate M 1 new bin grids by shifting each B j by the amount kh/m to the right [( B jk = j 1 k ) ( h, j + k ) ) h, k {1,..., M 1} M M Calculate a histogram for each bin grid ˆf h,k (x) = 1 n 1 nh (Xi B jk )1 (x Bjk ) i=1 j

Averaged shifted histogram Compute an average over these estimates ˆf h (x) = 1 M 1 1 n 1 M nh (Xi B jk )1 (x Bjk ) k=0 i=1 j = 1 n 1 M 1 1 n Mh (Xi B jk )1 (x Bjk ) i=1 k=0 Note: As M, ASH does not depend on the origin ie. step function continuous function. j Motivation for kernel density estimation.

Summary (1) The formula of the histogram with binwidth h and origin x 0 : ˆf h (x) = 1 n 1 nh (Xi B j )1 (x Bj ) i=1 where B j = [x 0 + (j 1)h, x 0 + jh) and j Z. Bias E[ˆf h (x) f (x)] f (m j ) (m j x) j Variance V[ˆf h (x)] 1 nh f (x) The asymptotic MISE AMISE = 1 nh + h2 12 f 2 2

Summary (2) The optimal binwidth h 0 that minimizes AMISE ( ) 6 1/3 h 0 = n f 2 n 1/3 2 The optimal binwidth h 0 that minimizes AMISE for N(0,1) (Rule-of-thumb) h 0 3.5n 1/3 The averaged shifted histogram (ASH) ˆf h (x) = 1 n 1 M 1 1 n Mh (Xi B jk )1 (x Bjk ) i=1 k=0 j