Generalized Cross Validation

Size: px

Start display at page:

Download "Generalized Cross Validation"

Clifton Stanley
6 years ago
Views:

1 26 november 2009

2 Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

3 Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

4 References Chapter 4 of Spline models for Observational Data (1990) - Grace Wahba Optimal Estimation of Contour Properties by Cross-Validated Regularization (1989) - Behzad Shahraray, David Anderson Smoothing Noisy Data with Spline Function (1979) - Peter Craven, Grace Wahba

5 Conditional Expectations From probability theory, we have for X L 2 {Ω, F, P} E[X ] = argmin E [ (X θ) 2] (1) θ R The least squares estimate therefore gives discretized estimate of θ and a natural norm to choose when searching for data disturbed by white noise

6 Ill-posedness Consider the model y i = g(t i ) + ɛ i, i = 1, 2,..., n, t i [0, 1] where g W (m) 2 = {f f, f,..., f (m 1) }abs.cont., f (m) L 2 [0, 1] and {ɛ i WN(0, σ 2 )} where σ is unknown. The least squares estimate gives min ĝi W (m) 2 n (y i ĝ i ) 2 = 0 {y i } n i=1 (2) i=1 The minimum does not depend on data and is clearly ill-posed.

7 Regularization We may choose to regularize, or smooth, the data as ĝ n,λ = argmin ĝi W (m) 2 n (y i ĝ(t i )) 2 + λ i=1 which has a unique solution. 1 0 (ĝ (m) (t)) 2 dt λ 0 (3)

8 Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

9 Properties of ĝ n,λ The reason for the term smoothing spline is the following For λ = 0, ĝ n,λ can be seen as an interpolating spline For λ =, ĝ n,λ is a single polynomial of degree m 1 (optimal in the least squares sense) For 0 < λ <, it can be shown that ĝ n,λ is composed by polynomials of degree at most 2m 1 on the intevals [t i, t i+1 ], i {1, 2,..., n 1} such that the function and its derivatives up to and including the 2m 2 derivative are continuous at the knots, and f (k) (t 1 ) = f (k) (t n ) = 0 for k = {m, m + 1,..., 2m 2}, i.e. natural conditions in the end points. From which we can see that there is a strong dependence between the quality of the result and a good choice of λ.

10 Examples Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

11 Examples Example Example of impact of dierent λ on ĝ n,λ and ĝ n,λ for y i = g(t i ) + ɛ i where g(t) = sin( πt ), t [0, 360] 180

12 Dening the Optimal λ Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

13 Dening the Optimal λ True Mean Square Error R(λ) and Optimal λ Using the notation above we dene the true mean square error, R(λ), as R(λ) := 1 n n (ĝ n,λ (t i ) g ti ) 2 (4) i=1 The optimal λ is then dened as λ = argmin R(λ) (5) λ R +

14 Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

15 Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

16 Intuition Given our sample of n measurements, we leave one measurement out and predict that data point by the use of the n 1 remaining measurements. The idea is now that the best modell for the measurements is the one that best predicts each measurement as a function of the others.

17 The Ordinary Cross-Validation (OCV) Denition Let ĝ k n,λ, k {1, 2,..., n} be dened as ĝ [k] n,λ = argmin ĝi W (m) 2 n (y i ĝ(t i )) 2 + λ i=1 i k 1 Then we dene the OCV mean square error as 0 (ĝ (m) (t)) 2 dt λ 0 (6) V 0 (λ) = 1 n n i=1 (ĝ [i] n,λ (t i) y i ) 2 (7)

18 The Ordinary Cross-Validation (OCV) Denition Let V 0 (λ) be dened as in the previous denition, the Ordinary Cross-Validation Estimate of λ is λ 0 = argmin V 0 (λ) (8) λ R +

19 Why we need more The OCV estimate have proven good accuracy for certain periodic functions, but in general it lacks a feature shown by the following: Consider the function {g(t) W (m) 2, g(t) = g(t + 1)} sampled equidistantly e.g. {t j = j n, j {1, 2,..., n}} adding some uniform noise. In this case all data are treated symmetrically. On the other hand, dropping the conditions of periodicity and equidistant sampling, the points interact dierently, giving rise to the need of weighting the samples dierently.

20 (GCV) Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

21 (GCV) Rewriting V 0 There exists an n n matrix A(λ), the inuence matrix, with the property ĝ n,λ (t 1 ) ĝ n,λ (t 2 ). ĝ n,λ (t n ) Such that V 0 (λ) can be rewritten as V 0 (λ) = 1 n = A(λ)y (9) n n i=1 (a kjy j y k ) 2 (10) (1 a kk ) 2 k=1 Where a kj, k, j 1, 2,..., n is element {k, j} of A(λ)

22 (GCV) The (GCV) Denition Let A(λ) be the inuence matrix dened above, then the GCV function is dened as V (λ) = 1 n (I A(λ))y 2 [ 1 n tr(i A(λ))] (11) 2 We say that the Generalized Cross-Validation Estimate of λ is λ = argmin V (λ) (12) λ R +

23 (GCV) Similarity to OCV By rewriting V (λ) we get V (λ) = 1 n n i=1 (ĝ [i] n,λ (t i) y i ) 2 w k (λ) (13) where w k (λ) are given by ( ) 2 1 a kk (λ) w k (λ) = (14) 1 n tr(i A(λ)) w k (λ) have been shown to adjust V 0 (λ) for periodicity and non-equidistant samples.

24 (GCV) Comparison Comparison of the true mean square error with the GCV function for the example λ on ĝ n,λ and ĝ n,λ for y i = g(t i ) + ɛ i where g(t) = sin( πt ), t [0, 360] 180

25 Convergence Result Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

26 Convergence Result Convergence Result Theorem Let g( ) W (m) 2 and let {t i } n i=1 satisfy ti 0 w(u)du = i n, where w(u) is a strictly positive continuous weight function. Then there exists sequences { λ n } n=1 and {λ n} n=1 of minimas of E[V (λ)] and E[R(λ)] such that the expectation ineciency I, I = E[V ( λ n )] E[R(λ n)] 1, as n (15)

27 Plan 1 Generals 2 What Regularization Parameter λ is Optimal? Examples Dening the Optimal λ 3 Generalized Cross-Validation (GCV) Convergence Result 4 Discussion

28 An implicit assumption that the true function g(t) is smooth GCV was applied eectively to problems such as: Finding the right order of splines in regression Regularized solution of the Fredholm integral equations of the rst kind It provides an estimate of the regularization parameter from general assumptions on the data.

COS 424: Interacting with Data

COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,