Nonparametric Density Estimation (Multidimension)

Similar documents
Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Modelling Non-linear and Non-stationary Time Series

Kernel density estimation

Nonparametric Density Estimation

Kernel density estimation for heavy-tailed distributions...

12 - Nonparametric Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Time Series and Forecasting Lecture 4 NonLinear Time Series

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Nonparametric Econometrics

Kernel Density Estimation

Kernel Density Estimation

A tour of kernel smoothing

41903: Introduction to Nonparametrics

Integral approximation by kernel smoothing

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

4 Nonparametric Regression

Density and Distribution Estimation

Smooth simultaneous confidence bands for cumulative distribution functions

Nonparametric Methods

Econ 582 Nonparametric Regression

Nonparametric Function Estimation with Infinite-Order Kernels

Goodness-of-fit tests for the cure rate in a mixture cure model

A Least Squares Formulation for Canonical Correlation Analysis

The Gaussian distribution

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis

An Introduction to Multivariate Statistical Analysis

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Motivational Example

Nonparametric Regression. Changliang Zou

Log-Density Estimation with Application to Approximate Likelihood Inference

Local Polynomial Regression

Cross-fitting and fast remainder rates for semiparametric estimation

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

Adaptive Nonparametric Density Estimators

Nonparametric Density Estimation. October 1, 2018

Variance Function Estimation in Multivariate Nonparametric Regression

Smoothness Adaptive Average Derivative Estimation

Notes on Random Vectors and Multivariate Normal

Multivariate Locally Weighted Polynomial Fitting and Partial Derivative Estimation

Kullback-Leibler Designs

Applications of nonparametric methods in economic and political science

Rank Estimation of Partially Linear Index Models

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Statistics for Python

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

Titolo Smooth Backfitting with R

3 Nonparametric Density Estimation

A Goodness-of-fit Test for Copulas

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Nonparametric Inference via Bootstrapping the Debiased Estimator

A Probability Review

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

Functional Latent Feature Models. With Single-Index Interaction

ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS

Log-transform kernel density estimation of income distribution

Density Estimation (II)

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

Bandwidth Selection in Nonparametric Kernel Estimation

A PROBABILITY DENSITY FUNCTION ESTIMATION USING F-TRANSFORM

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Computational Methods. Least Squares Approximation/Optimization

Acceleration of some empirical means. Application to semiparametric regression

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran

Minimum Error Rate Classification

Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Akaike criterion: Kullback-Leibler discrepancy

printing Three areas: solid calculus, particularly calculus of several

Error distribution function for parametrically truncated and censored data

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

10-701/ Machine Learning - Midterm Exam, Fall 2010

Introduction to Regression

Statistics 910, #5 1. Regression Methods

Basic Nonparametric Estimation Spring 2002

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

The Gerber Statistic: A Robust Measure of Correlation

A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES

Wavelet Regression Estimation in Longitudinal Data Analysis

Nonparametric Regression 10/ Larry Wasserman

Supplementary Materials to Convex Banding of the Covariance Matrix

Introduction to Regression

Akaike criterion: Kullback-Leibler discrepancy

On robust and efficient estimation of the center of. Symmetry.

Statistica Sinica Preprint No: SS

Introduction to Nonparametric and Semiparametric Estimation. Good when there are lots of data and very little prior information on functional form.

Function of Longitudinal Data

Bootstrap, Jackknife and other resampling methods

A Monte Carlo Investigation of Smoothing Methods. for Error Density Estimation in Functional Data. Analysis with an Illustrative Application to a

Logistic Kernel Estimator and Bandwidth Selection. for Density Function

Transcription:

Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007

Setup One-dimensional estimation Multivariate estimation Consider a d-dimensional data set with sample size n X i = X i1. X id, i = 1,..., n. Goal: Estimate the density f of X = (X 1,..., X d ) T f (x) = f (x 1,..., x d )

Multivariate kernel density estimator Kernel density estimator in d-dimensions ˆf h (x) = 1 n = 1 n n i=1 n i=1 ( 1 x h d K Xi h 1 h d K ) ( x1 X i1 h,..., x ) d X id h where K is a multivariate kernel function with d arguments. Note: h is the same for each components.

Multivariate kernel density estimator Extension: Bandwidths: h = (h 1,..., h d ) T ˆf h (x) = 1 n n i=1 ( 1 x1 X i1 K,..., x ) d X id h 1...h d h 1 h d

Kernel function What form should the multidim. kernel K(u) = K(u 1,..., u d ) take? Multiplicative kernel: K(u) = K(u 1 )... K(u d ) where K is a univariate kernel function. ˆf h (x) = 1 n ( 1 x1 X i1 K,..., x ) d X id n h 1...h d h 1 h d i=1 = 1 n d ( ) 1 xj X ij K n h j i=1 j=1 h j Note: Contributions to the sum only in the cube: X i1 [x 1 h 1, x 1 + h 1 ),..., X id [x d h d, x d + h d )

Kernel function Spherical/radial-symmetric kernel: or K(u) K( u ) K(u) = K( u ) R K( u ) d where u = u T u. (Exercise 3.13) The multivariate Epanechnikov (spherical): K(u) (1 u T u)1 (u T u 1) The multivariate Epanechnikov (multiplicative): K(u) = ( ) 3 d (1 u 2 4 1)1 ( u1 1)...(1 ud 2 )1 ( u d 1)

Kernel function Epanechnikov kernel function Equal bandwidth in each direction: h = (h 1, h 2 ) T = (1, 1) T

Kernel function Epanechnikov kernel function Different bandwidth in each direction: h = (h 1, h 2 ) T = (1, 0.5) T

Multivariate kernel density estimator The general form for the multivariate density estimator with bandwidth matrix H (nonsingular) ˆf H (x) = 1 n = 1 n n i=1 1 det(h) K ( H 1 (x X i ) ) n K H (x X i ) i=1 where K H ( ) = 1 det(h) K(H 1 )

Multivariate kernel density estimator The bandwidth matrix includes all simpler cases. Equal bandwidth h: H = hi d where I d is the d d identity matrix. Different bandwidths h 1,..., h d : H = diag(h 1,..., h d )

Multivariate kernel density estimator What effect has the off-diagonal elements? Rule-of-Thumb: Use a bandwidth matrix proportional to ˆΣ 1 2, where ˆΣ is the covariance matrix of the data. Such a bandwidth corresponds to a transformation of the data, so that they have an identity covariance matrix, ie. we can use bandwidths matrics to adjust for correlation between the components.

Kernel function Epanechnikov kernel function Bandwidth matrix: ( 1 0.5 H = 0.5 1 )

Properties of the kernel function K is a density function R d K(u) du = 1 and K(u) 0 K is symmetric R d uk(u) du = 0 d K has a second moment (matrix) uu T K(u) du = µ 2 (K)I d R d where I d denotes the d d identity matrix K has a kernel norm K 2 2 = K 2 (u) du

Properties of the kernel function K is a density function. Therefore is also ˆf H a density function ˆf H (x) dx = 1 The estimate is consistent in any point x ˆf H (x) = 1 n n K H (X i x) P f (x) i=1

Statistical Properties Bias: ) E (ˆfH (x) f (x) 1 2 µ 2(K)tr{H T H f (x)h} Variance: ) V (ˆf H (x) AMISE: AMISE(H) = 1 4 µ2 2(K) 1 n det(h) K 2 2f (x) tr{h T H f (x)h} 2 dx + 1 n det(h) K 2 2 where H f is the Hessian matrix and K 2 2 squared L 2 -norm af K. is the d dimensional

Special case Univariate case: For d = 1 we obtain H = h, K = K, H f (x) = f (x) Bias: ) E (ˆf H (x) f (x) 1 2 µ 2(K)tr{H T H f (x)h} 1 2 µ 2(K)h 2 f (x) Variance: ) V (ˆfH (x) 1 n det(h) K 2 2f (x) 1 nh K 2 2f (x)

Bandwidth selection AMISE optimal bandwidth: We have a bias-variance trade-off which is solved in the AMISE optimal bandwidth. h is a scalar, H = hh 0 and det(h 0 ) = 1, then AMISE(H) = 1 [ ] 2 4 h4 µ 2 2(K) tr{h T 1 0 H f (x)h 0 } dx + nh d K 2 2 Then the optimal bandwidth and the optimal AMISE are h opt n 1/(4+d), AMISE(h opt H 0 ) n 4/(4+d) Note: The multivariate density estimator has a slower rate of convergens compared to the univariate one. H = hi d and fix sample size n: The AMISE optimal bandwidth larger in higher dimensions.

Bandwidth selection Bandwidth selection: Plug-in method (rule-of-thumb, generalized Silvermann rule-of-thumb) Cross-validation method

Bandwidth selection Plug-in method Idea: Optimize AMISE under the assumption that f is multivariate normal distribution N d (µ, Σ) and K is a multivariate Gaussian, ie. N d (0, I), then µ 2 (K) = 1 K 2 2 = 2 d π d/2 Then = tr{h T H f (x)h} 2 dx 1 2 d+2 π d/2 det(σ) 1/2 [2tr(HT Σ 1 H) 2 + {tr(h T Σ 1 H)} 2 ]

Bandwidth selection Simple case: H = diag(h 1,..., h d ) and Σ = diag(σ 1,..., σ d ), then ( ) 4 1/(d+4) h j = n 1/(d+4) σ j d + 2 }{{} C Silverman s rule-of-thumb (d = 1): ( 4ˆσ 5 ĥ rot = 3n ) 1/5

Bandwidth selection Replace σ j with ˆσ j and notice that C always is between 0.924 (d = 11) and 1.059 (d = 1): Scott s rule ĥ j = n 1/(d+4)ˆσ j It is not possibel to derive the rule-of-thumb in the general case, but it might be a good idea to choose the bandwidth matrix proportional to the covariance matrix. Generalization of Scott s rule: 1/(d+4) Ĥ = n ˆΣ1/2

Bandwidth selection Cross-validation: ISE(H) = = ) 2 (ˆfH (x) f (x) dx ˆf 2 H (x) dx }{{} Cal. from data Estimate of the expectation + Eˆf H (X) = 1 n f 2 (x) dx } {{ } Ignore n ˆf H, i (X i ) i=1 ) 2 (ˆfH f (x) dx }{{} =Eˆf H (X) where the multivariate version of the leave-one-out estimator is ˆf H, i (x) = 1 n K H (X j x) n 1 j=1,j i

Bandwidth selection Multivariate cross-validation criterion: CV(H) = 1 n 2 det(h) 2 n(n 1) n i=1 j=1 n n K K { H 1 (X j X i ) } n i=1 j=1,j i K H (X j X i ) Note: The bandwidths is a d d matrix H which means we have to minimize over d(d+1) 2 parameters. Even if H is diagonal matrix, we have a d-dimensional optimization problem.

Canonical bandwidths The canonical bandwidth of kernel j { } K δ j 2 1/(d+4) = 2 µ 2 (K ) 2 Therefore where AMISE(H j, K j ) = AMISE(H i, K i ) H i = δi δ j Hj

Canonical bandwidths Example: Adjust from Gaussian to Quartic product kernel d δ G δ Q δ Q /δ G 1 0.7764 2.0362 2.6226 2 0.6558 1.7100 2.6073 3 0.5814 1.5095 2.5964 4 0.5311 1.3747 2.5883 5 0.4951 1.2783 2.5820

Graphical representation Example: Two-dimensions Est-West German migration intention in Spring 1991. Explanatory variables: Age and household income Two-dimensional nonparametric density estimate ˆf h (x) = ˆf h (x 1, x 2 ) where the bandwidth matrix H = diag(h)

Graphical representation Contour plot

Graphical representation Example: Three-dimensions How can we display three- or even higher diemsional density estimates? Hold one variable fix and plot the density function depending on the other variables. For three-dimensions we have x 1, x 2 vs. ˆf h (x 1, x 2, x 3 ) x 1, x 3 vs. ˆf h (x 1, x 2, x 3 ) x 2, x 3 vs. ˆf h (x 1, x 2, x 3 )

Graphical representation Example: Three-dimensions Credit scoring sample. Explanatory variables: Duration of the credit, household income and age.

Graphical representation Contour plot