Computation time/accuracy trade-off and linear regression

Size: px

Start display at page:

Download "Computation time/accuracy trade-off and linear regression"

Griffin Griffith
5 years ago
Views:

1 Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe, MODAL team 3-7 april 2017 BRUNIN Maxime 1 / 19

2 Outline 1 2 3

3 Contents BRUNIN Maxime 3 / 19

4 Linear regression General goal Find an estimator which performs better in terms of MSE or/and computation time than the usual estimator. Linear regression model Y = X θ + ɛ, with X M n,d (R) with rg(x ) = d ; θ R d is unnown ; ɛ N ( 0, σ 2 I n ). We are in the case n > d. We usually use the Ordinary Least Squares (OLS) ˆθ to estimate θ. Specific goal to linear regression Find an estimator that performs better in terms of MSE than ˆθ. BRUNIN Maxime 4 / 19

5 Contents BRUNIN Maxime 5 / 19

6 Gradient descent algorithm 0, 0, Objective function: g (θ) = 1 2n Y X θ 2 2,n for θ Rd. Gradient descent with fixed step α. ˆθ (+1) = ˆθ () α g(ˆθ () ) ( ( ˆθ () = I d I d α ) ) ( n X T X ˆθ + I d α ) n X T X θ (0). Ŷ () = X ˆθ (). BRUNIN Maxime 6 / 19

7 Accuracy of our estimator We assess the accuracy of ˆθ () by or MSE where Y = X θ. ()) = 1 n ()) Ŷ () Y 2 2,n [ ] = E ()), Property (Brunin (2016)) { MSE If = argmin N where Ŷ = X ˆθ. ())}, then () MSE ) ) < MSE, BRUNIN Maxime 7 / 19

8 Trade-off bias variance (1/2) For d = 20 n = MSE (Y^() ) MSE (Y^) bias (Y^() ) 2 var (Y^() ) () MSE ) = 1 n S P T Y (0) Y ) 2 } {{ } bias () ) 2 + σ2 2,n ( ( n Tr I n S ) ) 2, }{{} var () ) where K = 1 n XX T = PΛP T ; S = I n αλ ; 0 < α < 1 ˆλ 1 ; ˆλ 1 = K 2. BRUNIN Maxime 8 / 19

9 Trade-off bias variance (2/2) 0.16 For d = 20, n = E[ ( Y^() ) ] ( Y^() ) 1 ( Y^() ) 2 ( Y^() ) 3 where α = ] 0, 1 ˆλ 1 [. BRUNIN Maxime 9 / 19

10 to estimate = argmin N { )} () (1/3) Property (Brunin (2016)) { If = argmin N where Ŷ = X ˆθ. Property (Brunin (2016)) ())}, then with high probability, ( ) ) ), There exists M 1, M 2, M 3, M 4 > 0 such as, with high probability, for large n, M 1 + M 2 log(n) M 3 + M 4 log(n). BRUNIN Maxime 10 / 19

11 B 2 { )} () to estimate = argmin (2/3) N 0, ()) 2 [ ] E Ŷ () Y [ ] Ŷ () E Ŷ () 2 n 2,n n 2,n }{{}}{{} V. Lemma If θ 2,d 1 and θ (0) = 0, N, B 2 2ˆλ 1e 2αˆλ d := B 2,sup. Lemma There exists C 1 > 0 such as, with probability at least 1 e y, 0, max, (y + log (max + 1)) V 2E [ V ] + C 1 n d { ) } and 2E [ V ] 4σ2 2 min 1, (αˆλ j := V sup. n j=1 BRUNIN Maxime 11 / 19

12 to estimate = argmin N { )} () (3/3) With probability at least 1 e y, 0, max, () ) B 2,sup (y + log (max + 1)) + 2E [ V ] + C 1 n s ˆ 1 and ˆ 2 { } ˆ 1 = min N : B 2,sup Ê [ V +1 ] > B 2,sup + 2Ê [ V ] { } ˆ 2 = min N : B 2,sup +1 + Ê [ V +1 ] > B 2,sup + Ê [ V ] ( ( ) ) 2 where Ê [ V ] = 2ˆσ2 d n j=1 1 1 αˆλ j. Property ( N interesting when > ˆ 1 or > ˆ 2 ), [ () E ) ] σ2 4n d j=1 { ) } 2 min 1, (αˆλ j. BRUNIN Maxime 12 / 19

13 Contents BRUNIN Maxime 13 / 19

14 Relative gain as a function of n for d = 20 For d = Relative gain * ^1 ^ where for ˆ {, ˆ1, ˆ2 }, GainRel (ˆ) ) = n ) MSE MSE (ˆ)) ) MSE BRUNIN Maxime 14 / 19

15 Relative gain as a function of n for d = For d = 100 Relative gain * ^1 ^ where for ˆ {, ˆ1, ˆ2 }, GainRel (ˆ) ) = n ) MSE MSE (ˆ)) ) MSE BRUNIN Maxime 15 / 19

16 Perspective With probability at least 1 e y, 0, max, () ) B 2,sup + V sup (y + log (max + 1)) + C 1 n Approach of (Rasutti, Wainwright, and Yu 2014) based on RKHS ˆ 3 { ( ) } 2 ˆ 3 = max N : V sup c B 2,sup σˆσ. BRUNIN Maxime 16 / 19

17 General use of stopping rule enables to reduce computation time (without loosing in accuracy) in problems when ˆθ has no closed formula and needs a lot of iterations to be computed. For instance, in the problem of two Gaussian univariate mixture where only the proportion p is unnown, we use a EM whose estimate at the s th iteration is ˆp (s) and ˆp (s) ˆp. s + BRUNIN Maxime 17 / 19

18 Conclusion Results: (ˆ shows that, for some value of n, MSE 1 )) ) (ˆ MSE 2 )) <MSE. Perspectives: Prove theoretical results on ˆ 1 and ˆ 2. Study ˆ 1 and ˆ 2 in the framewor of ernel methods. ) <MSE and BRUNIN Maxime 18 / 19

19 Bibliography Rasutti, Garvesh, Martin J. Wainwright, and Bin Yu (2014). Early Stopping and Non-Parametric Regression: An Optimal Data-Dependent Stopping Rule. In: J. Mach. Learn. Res. 15.1, pp BRUNIN Maxime 19 / 19

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning