Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann
Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density and display it graphically. Construction: Divide the range into bins B j = [x 0 + (j 1)h, x 0 + jh), j Z with origin x 0 and binwidth h. Count the observations in each B j (=: n j ) Normalize to 1: f j = n j nh (relative frequencies, divided by h). Draw bars with height f j for bin B j.
Formula Formula of the histogram: ˆf h (x) = 1 nh n 1 (Xi B j )1 (x Bj ) i=1 Note: Denote by m j the center of the bin B j. The histogram assigns each x in B j = [m j h 2, m j + h 2 ) the same estimate, ˆf h (m j ) for f. j
Derivation Motivation of the histogram: The probability of an observation X will fall into the bin B j = [m j h 2, m j + h 2 ) is P(X B j ) = f (u)du B j f (m j ) h Approximate by the relative frequency of observations in the interval: P(X B j ) 1 n #{X i B j } Combining this, we get ˆf h (m j ) = 1 nh #{X i B j }
Binwidth The histogram ˆf h (m j ) depends on the binwidth h and the origin x 0. The effect of the choice of binwidth is displayed in the four histograms:
Statistical properties (Asymptotic) Statistical properties of the histogram as an estimator of the unknown density. Let X 1,..., X n f. We have Consistency: ˆf h (x) = 1 nh n 1 (Xi B j )1 (x Bj ) i=1 Is ˆf h (x) a consistent estimator of f (x), ie. ˆf h (x) j P f (x)? Suppose the origin x 0 = 0. We want to estimate the density at x B j = [(j 1)h, jh) ˆf h (x) = 1 nh n i=1 1 (Xi B j )
Bias and Variance Bias E[ˆf h (x) f (x)] f (m j ) (m j x) Note: The bias is increasing in the slope of f (m j ) and the bias is 0 if x = m j. Variance V[ˆf h (x)] 1 nh f (x) Note: The variance is proportional to f (x) and decreases when nh increases. Bias increases when h increases and variance decreases when h increases. i.e. we have to find a compromise between bias and variance to find an optimal h.
Mean Square Error (MSE) Mean Square Error MSE[ˆf h (x)] = E[ˆf h (x) f (x)] 2 = Variance + Bias 2 (general result) 1 nh f (x) + [ f (m j ) ] 2 (mj x) 2 Note: The histogram converges in mean square to f(x) if h 0 and nh. That means more and more observations and smaller and smaller binwidth, but not too fast. Convergence in mean square implies convergence i probability: ˆf h (x) is a consistent estimator of f (x).
Bias, variance and MSE for a histogram Squared bias: Thin solid line. Variance: Dashed line. MSE: Thick line.
Mean Integrated Squared Error (MISE) MSE measures the accuracy of ˆf h (x) as an estimator of f in a single point. But we want a global quality measure: MISE [ ] 2 MISE(ˆf h ) = E (ˆf h (x) f (x)) dx [ ) ] 2 = E (ˆf h (x) f (x) dx =. where f 2 2 = f (x) 2 dx ] MSE [ˆfh (x) dx 1 nh + h2 12 f 2 2 = AMISE(ˆf h )
Optimal Binwidth Criterion for selecting an optimal binwidth h: Select h that minimizes AMISE. Hence AMISE(ˆf h ) h = 1 nh 2 + 1 6 h f 2 2 = 0 ( ) 6 1/3 h 0 = n f 2 n 1/3 2
Rule-of-thumb binwidth Problem: f is unknown, so we cannot calculate f 2 2!!! Solution: Assume that f follows a special distribution, ex. standard normal distribution, then: f 2 2 = 1 4 π Therefore we get a rule-of-thumb binwidth: ( ) 1/3 6 h 0 = n 1 3.5n 1/3 4 π
Origin The histogram depends on the origin
Drawbacks of the histogram Constant over interval (step function) Results depend on origin Binwidth choice Slow rate of convergence. Solution to the dependence on the origin x 0 : Averaged Shifted (ASH)
Averaged shifted histogram (idea) ASH is obtained by averaging over histograms correspondig to different origins. It seems to correspond to a smaller binwidth than the histogram from which it is constructed. But it is not an ordinary histogram with smaller binwidth.
Averaged shifted histogram with origin x 0 = 0, and bins B j = [(j 1)h, jh), j Z Generate M 1 new bin grids by shifting each B j by the amount kh/m to the right [( B jk = j 1 k ) ( h, j + k ) ) h, k {1,..., M 1} M M Calculate a histogram for each bin grid ˆf h,k (x) = 1 n 1 nh (Xi B jk )1 (x Bjk ) i=1 j
Averaged shifted histogram Compute an average over these estimates ˆf h (x) = 1 M 1 1 n 1 M nh (Xi B jk )1 (x Bjk ) k=0 i=1 j = 1 n 1 M 1 1 n Mh (Xi B jk )1 (x Bjk ) i=1 k=0 Note: As M, ASH does not depend on the origin ie. step function continuous function. j Motivation for kernel density estimation.
Summary (1) The formula of the histogram with binwidth h and origin x 0 : ˆf h (x) = 1 n 1 nh (Xi B j )1 (x Bj ) i=1 where B j = [x 0 + (j 1)h, x 0 + jh) and j Z. Bias E[ˆf h (x) f (x)] f (m j ) (m j x) j Variance V[ˆf h (x)] 1 nh f (x) The asymptotic MISE AMISE = 1 nh + h2 12 f 2 2
Summary (2) The optimal binwidth h 0 that minimizes AMISE ( ) 6 1/3 h 0 = n f 2 n 1/3 2 The optimal binwidth h 0 that minimizes AMISE for N(0,1) (Rule-of-thumb) h 0 3.5n 1/3 The averaged shifted histogram (ASH) ˆf h (x) = 1 n 1 M 1 1 n Mh (Xi B jk )1 (x Bjk ) i=1 k=0 j