Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology

Size: px

Start display at page:

Download "Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology"

Charlotte Ryan
5 years ago
Views:

1 Data mining with sparse grids Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology

2 Classification problem Given: training data d-dimensional data set (feature space); typically relatively big d danger of curse of dimensionality class label, e.g. { 1, 1}, for each data point Goal: based on the training data, construct a classifier that can predict the class of any given new data point Strategies to solve the problem: neural networks, support vector machines, regularization networks,... Approach here: regularization network based, minimization on sparse grids Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 2/31

3 More mathematical problem description Given: training set S = {(x i, y i ) R d R} M i=1 Assume data were obtained by sampling of unknown function f V defined over R d, sampling process was disturbed by noise. Goal: recover f from the data as good as possible. To make problem well-posed add additional smoothness constraint Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 3/31

4 More mathematical problem description Variational problem with R(f ) = 1 M min R(f ) f V M C(f (x i ), y i ) + λφ(f ) i=1 C(, ): cost function measuring interpolation error Φ(f ): smoothness functional λ: regularization parameter, balances the two terms. Example: C(x, y) = (x y) 2, Φ(f ) = f 2 2. λ has to be chosen appropriately, e.g. according to cross-validation Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 4/31

5 Discretization Restriction to finite dimensional subspace V N V : N f N = α j φ j (x), j=1 {αj } N j=1 : degrees of freedom {Φj } N j=1 : a basis for V N. Choose C(f N (x i ), y i ) = (f N (x i ) y i ) 2, Φ(f N ) = Pf N 2 L 2. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 5/31

6 Linear system Minimizing R(f ) = 1 M M (f (x i ), y i ) 2 + λ Pf N 2 L 2, i=1 1 M for f N V N is equivalent to solving N M N α j φ j (x i ) φ k (x i )+λ α j (Pφ j, Pφ k ) L2 = 1 j=1 i=1 j=1 M M y i φ k (x i ) i=1 In matrix form (B B T + λc)α = By, C R N N, B R N M Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 6/31

7 Direct FE approximation Naive approach: Direct application of finite element method, solving the arising linear system; Assume an equidistant grid Ω n with mesh size h n = 2 n for each coordinate direction Number of grid points would be of the order ) = O(2 nd ) not feasible! O(h d n Matrix sizes: C is (2 n + 1) d (2 n + 1) d, B is (2 n + 1) d M Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 7/31

8 Sparse grids V n = span{φ n,j, j t = 0,..., 2 n t = 1,..., d} l = (l 1,..., l d ), j = (j 1,..., j d ), j t = 0,..., 2 lt d φ l,j = φ lt,jt (x t ), where φ lt,jt (x t ) is the scaled and t=1 translated hat function. V l = span{φ l,j, j t = 0,..., 2 lt, t = 1,..., d} Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 8/31

9 Sparse grids V n = span{φ n,j, j t = 0,..., 2 n t = 1,..., d} l = (l 1,..., l d ), j = (j 1,..., j d ), j t = 0,..., 2 lt d φ l,j = φ lt,jt (x t ), where φ lt,jt (x t ) is the scaled and t=1 translated hat function. V l = span{φ l,j, j t = 0,..., 2 lt, t = 1,..., d} Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 8/31

10 Sparse grids d W l = V l \ V l et t=1 are the difference spaces V n = l n W l V n (s) = l 1 n+d 1 W l, dim(v l ) = O(n d 1 2 n ). (s) Lp f f n = O(h 2 n log(h 1 n ) d 1 ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 9/31

11 Sparse grids Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 10/31

12 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31

13 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31

14 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31

15 Sparse grid combination technique Idea: Write the sparse grid interpolant as a linear combination of full grid interpolants. Discretize the problem independently on these full grids. In 2D, In 3D,... û c (n,n,n) = û c (n,n) = i+j+k=n+2 i+j=n+1 u i,j,k 2 u i,j i+j+k=n+1 u i,j i+j=n u i,j,k + u i,j,k i+j+k=n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 12/31

16 Sparse grid combination technique Idea: Write the sparse grid interpolant as a linear combination of full grid interpolants. Discretize the problem independently on these full grids. In 2D, In 3D,... û c (n,n,n) = û c (n,n) = i+j+k=n+2 i+j=n+1 u i,j,k 2 u i,j i+j+k=n+1 u i,j i+j=n u i,j,k + u i,j,k i+j+k=n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 12/31

17 Sparse grid combination technique Combination technique illustration, "Hierarchization for the sparse grid combination technique"by Philip Hupp Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 13/31

18 An error estimate Necessary assumptions Exact solution is sufficiently smooth There are pointwise asymptotic error expansions for the discrete solutions In 2D: for i, j = 1, 2,... u u i,j = C 1 (h i )h 2 i + C 2 (h j )h 2 j + D(h i, h j )h 2 i h 2 j Coefficient functions C1, C 2, D bounded, C 1 (h i ) κ, C 2 (h j ) κ, D(h i, h j ) κ Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 14/31

19 An error estimate Difference between exact and SGC solution: u ûn,n c = u ( u ij u ij ) = i+j=n+1 i+j=n+1 }{{} n terms i+j=n (u u ij ) (u u ij ) i+j=n }{{} n 1 terms Put in the error expansion u ûn,n c = (C 1 hi 2 + C 2 hj 2 + Dhi 2 hj 2 ) (C 1 hi 2 + C 2 hj 2 + Dhi 2 hj 2 ) i+j=n+1 = C 1 h 2 n + C 2 h 2 n + i+j=n+1 i+j=n Dh 2 i h 2 j i+j=n Dh 2 i h 2 j Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 15/31

20 An error estimate Use definition of mesh width, h i = 2 i, h j = 2 j { 2 hi 2 hj 2 = 2i 2 2(n+1 i) = 2 2n 2 2 = 1 4 h2 n i + j = n i 2 2(n i) = 2 2n = hn 2 i + j = n, hence u ûn,n c = (C 1 + C D D)hn 2 4 i+j=n+1 i+j=n Use boundedness of the coefficient functions u û c n,n (κ + κ nκ + (n 1)κ)h2 n = ( n)κh2 n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 16/31

21 An error estimate Error estimate 2D u û c n,n ( n)κh2 n = O(h 2 n log 2 (h 1 n )) (1) since n = log 2 (2 n ) = log 2 (h 1 n ). Error estimate 3D u ûn,n,n c ( n n2 )hn 2 = O(hn 2 log 2 (hn 1 ) 2 ) (2) is obtained using the same steps Can be generalized to higher dimensions Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 17/31

22 SGC for the classification problem Solve problem on sequence of grids Ω l O(dn d 1 ) linear systems (λc l + B l B T l )α l = B l y where (C l ) j,k = M ( Φ l,j, Φ l,k ) and (B l ) j,i = Φ l,j (x i ) Each system of size dim(v l ) = O(2 n ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 18/31

23 SGC for the classification problem Put together combination solution f (c) n ( ) d 1 f n (c) d 1 (x) = ( 1) q q q=0 l =n+(d 1) q f l (x) f l is the solution on a full grid. f l (x) = j α l,j φ l,j (x) V l Any linear operation on f n (c) the combination formula ( ) d 1 F (f n (c) d 1 ) = ( 1) q q q=0 can be expressed by means of l =n+(d 1) q F (f l ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 19/31

24 Some implementational aspects Training: Go through all grids, solve small linear systems Algorithm 1 Training 1: for q=0:d-1: 2: for l 1 =1:n-q: 3: for... 4: l d = n - q - (l 1-1) (l d 1-1) 5: Assemble C l, B l. 6: Solve (λc l + B l Bl T )α l = B l y 7: Save α l We use preconditioned CG for solving the system This is the slow part Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 20/31

25 Some implementational aspects Testing: use saved α ls to compute labels for given data Algorithm 2 Testing 1: for q=0:d-1: 2: for l 1 =1:n-q: 3: for... 4: Evaluate f l (x) = 2 l 1 j l1 2 l d j ld α l Φ l,j (x) 5: Add weighted contribution to overall solution Also function evaluation only on the subgrids Once we have computed α l we can reuse it for any test data Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 21/31

26 Computational complexity (Theoretical) cost: Assembly C l : O(3 d N) (N = 2 n : number of unknowns) Assembly B l : O(NM) [O(d 2 d M)], (M: number of data points) total number of subgrids: O(dn d 1 ) Overall assembly cost: O(dn d 1 (3 d N + d2 d M)) Similar cost for solving the systems (Function evaluation: O(n d 1 N M)) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 22/31

27 Numerical results Checkerboard example - 2D 1000 data points 750 training and 250 test points Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 23/31

28 Checkerboard example # levels training accuarcy test accuracy % 82.4 % % 84.4 % % 85.6 % % 90.8 % % 92.0 % Results for the checkerboard example with λ = Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 24/31

29 A second example Our classifier (with λ = 0.001): testing accuarcy of 91% for data set 1, 89% for data set 2, 86% for data set 3 For (a rough) comparison: Neural Net: 82%, 89%, 88% Gaussian process: 90%, 89%, 88% Slower Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 25/31

30 A second example The influence of lambda First synthetic data set λ = λ = 1e 5 n train test train test Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 26/31

31 Computational cost Dependence on number of levels Scaling of computational time with number of levels. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 27/31

32 Computational cost Dependence on number of data points Computational time scales linearly with number of data points. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 28/31

33 8-dimensional example: Diabetes dataset # levels training accuarcy test accuracy λ % 79 % 1e % 80.1 % 1e % 82.3 % 1e % 83.8 % 1e-4 Results for the 8D dataset 768 instances with 8 features (like body mass index, diastolic blood pressure, etc) and a class label. 700 traning points, 68 testing points. Highly sensitive to choice of training and testing data. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 29/31

34 Finally... Questions and Remarks? Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 30/31

35 References J. Garcke, M. Griebel and M. Thess, Data Mining with Sparse Grids, Computing, 2001 J.Garcke, Sparse Grids in a Nutshell, Springer, M. Griebel, M. Schneider, C. Zenger A combination technique for the solution of sparse grid problems, 1990 Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 31/31

Dimension-adaptive Sparse Grids

Dimension-adaptive Sparse Grids Jörg Blank April 14, 2008 1 Data Mining Function Reconstruction 2 Sparse Grids Motivation Introduction 3 Combination technique Motivation Dimension-adaptive 4 Examples 5