Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa

Size: px

Start display at page:

Download "Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa"

Emmeline Carson
5 years ago
Views:

1 Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa Workshop on computational and algebraic methods in statistics March 3-5, Sanjo Conference Hall, Hongo Campus, University of Tokyo Abstract The method of the group lasso which is a kind of generalization of the lasso has become a popular method of variable selection for linear regression and has a variety of applications. In this paper, we introduce the application of the group lasso to design of experiments. First, to construct an optimal design, we enumerate the candidate design points. In many cases, it is difficult to conduct the experiments at every candidate design points from the aspect of the cost. Therefore, it is necessary to choose the subset of design points among them with minimum loss of information. We explain that this procedure corresponds to the method of variable selection for regression analysis and can be formulated as the problem of the group lasso. Finally, we give some numerical examples, which show that several orthogonal arrays can be obtained automatically by solving the group lasso problem. 1/20

2 Group Lasso 2/20

3 Lasso (least absolute shrinkage and selection operator) Lasso: Linear regression with Data: Model: Parameters (unknown): Design Matrix: Linear Regression Lasso x 1, y 1, x 2, y 2,, x N, y N L 1 Y = β 0 + β 1 X β p X p + ε β = β 0,, β p T X = x 1,, x N T min y Xβ 2 = min β β min β y Xβ 2 + p j=1 λ j norm regularization N i=1 β j Linear Regression y x 1, y 1 x 2, y 2 x N, y N y i x i T β 2 x ( λ j : tuning parameters) L 1 norm regularization 3/20

4 t t = 1 t t = 1 Sparseness t t f: a convex function on R f t 0 t = c t, f t f t 0 c t t 0 The subdifferential of 1 t < 0 t t = 1,1 (t = 0) 1 (t > 0) t Lasso p β y Xβ + j=1 λ j β β j = 0 The elements of the solution vector tend to be zero. The lasso estimates sparse coefficients(parameters). (In statistics, we prefer models with fewer variables.) 4/20

5 Group Lasso The group lasso does variable selection at the group level. β I1 β I2 β IG β = β 0,, β i1, β i1 +1,, β i2,, β ig 1 +1, β p G Group lasso min β y Xβ 2 + λ g β Ig Euclidean norm (not squared!!) 2 2 β Ig = β ig β ig β Ig tends to be zero. 5/20

6 Design of Experiments 6/20

7 Design of Experiments One of the purpose of design of experiments is to construct the optimal experimental design with respect to a criterion under some constraints reflecting real problem. We consider the case where there are three factors a, b, c and each factor has two levels 1, +1. There are design points. 2 3 = a b c Each column corresponds to an experiment. Choose fewer design points among with the minimum loss of information.,, 1 7/20

8 Main effect model: The Problem Setting Y = μ + β a a + β b b + β c c + ε ε N 0, σ 2 We have to solve the following two problem at the same time. β a, β b, β c Obtain the good linear unbiased estimators of. Choose fewer design points among 1,, a b c /20

9 Linear Estimator Linear estimator intercept a b c outputs Y 1 Y = μ + β a a + β b b + β c c + ε Main effect model : ε N 0, σ 2 We have to solve the following two problem at the same time. Obtain the good linear unbiased estimators of β a, β b, β c. Choose fewer design points. Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y β a = x ag Y g β b = x bg Y g β c = x cg Y g x ag, x bg, x cg : the weights for linear estimators 9/20

10 Unbiasedness Y = μ + β a a + β b b + β c c + ε ε N 0, σ 2 intercept a b c = M Equality constraint x a1 β a = x ag Y g There exists x a = x 0 a is unbiased. 1 such that Mx a = e a =. 0 E β a = β a 0 β b β c Much the same is true on,. 10/20

11 Unbiasedness Y = μ + β a a + β b b + β c c + ε ε N 0, σ 2 intercept a b c = M outputs Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y E β a = E x ag Y g = μ β a β b β c = μ β a β b β c Mx a = μ β a β b β c 0 0 Mx a = e a = = β a x a1 x a 11/20

12 Variances Y = μ + β a a + β b b + β c c + ε Main effect model : ε N 0, σ 2 We have to solve the following two problem at the same time. Obtain the good linear unbiased estimators of β a, β b, β c. Choose fewer design points. Objective function The variance of Var β a = β a = x ag Y g x 2 ag Var Y g = σ 2 : x 2 ag = σ 2 x 2 a β b β c Much the same is true on,. (We want to minimize the variances as the good estimators. ) 12/20

13 Group Lasso minimize x 2 x a,x b,x c R a + x 2 2 b + x c subject to Mx a Mx b Mx c = Y = μ + β a a + β b b + β c c + ε Main effect model : ε N 0, σ 2 We have to solve the following two problem at the same time. Obtain the good linear unbiased estimators of β a, β b, β c. Choose fewer design points. + e a e b e c λ g x 2 ag + x 2 2 bg + x cg Unbiasedness Variances Sparseness ( fewer design points) 13/20

14 Sparseness and the Number of Experiments Linear estimator intercept a b c outputs Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y β a = x ag Y g β b = x bg Y g β c = x cg Y g x ag, x bg, x cg : the weights for linear estimators g x ag = 0, x bg = 0, x cg = 0 If there exists such that, then the output Y g of the g-th design point is not used for the estimators β a, β b, β c!! 14/20

15 Symmetries If we set the tuning parameters, then it causes a problem... minimize x 2 x a,x b,x c R a + x 2 b + x 2 c subject to Mx a Mx b Mx c = e a e b e c The above problem is strictly convex. Therefore, the minimum is unique. λ g = λ g = 1,, x a, x b, x c The above problem is invariant under the exchanges of. + λ x 2 ag + x 2 2 bg + x cg 15/20

16 Symmetries intercept a b c outputs Y 1 λ g = λ In this setting ( ), we can not obtain L4 orthogonal array as the optimum solution of the problem. a b c L4 L4 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y Both L4 and L4 are feasible solutions and the value of the objective function is the same. However, the optimum solution is unique!! a b c We have to set the tuning parameters with careful consideration to obtain the sparse solution. 16/20

17 Symmetries minimize x a,x b,x c R x a 2 + x b 2 + x c 2 + λ g x 2 ag + x 2 2 bg + x cg subject to Mx a Mx b Mx c = e a e b e c x a x b x c x a x b x c does not change the problem if and x a 2 + x b 2 + x c 2 + x a x b x c x a x b x c ker M λ g x 2 ag + x 2 2 bg + x cg = x a 2 + x b 2 + x c 2 +. M λ g x 2 ag + x 2 2 bg + x cg Centrally symmetric? /20

18 Numerical Examples Example 1) Assume that there are 3 binary factors a, b, c. Y = μ + β a a + β b b + β c c + ε Group lasso Example 2) L4 a b c λ 1,, λ = (0, 100, 100, 0, 100, 0, 0, 100) Group lasso Assume that there are 4 binary factors a, b, c, d. Y = μ + β a a + β b b + β c c + β d d + β ab ab + β ac ac + β ad ad + ε L a b ab c ac ad d λ 1,, λ 16 = (0, 0, 400, 320, 400, 320, 0, 0, 400, 320, 0, 0, 0, 0, 400, 320) 1/20

19 Numerical Examples Example 3) Assume that there are 4 binary factors a, b, c, d. Y = μ + β a a + β b b + β c c + β d d + β ab ab + β ac ac + β ad ad + β bc bc + ε Group lasso λ 1,, λ 16 = a b ab c ac ad d bc (0, 0, , , , , 1.11, 5.4, , , 13.02, 3.11, 17.3, 9.17, , ) 19/20

20 Problems Q: Choosing the appropriate values of λ g is needed. What kinds of λ g correspond to orthogonal arrays? Choosing the values of λ g Choosing the weights (ordering) of the columns of M M Q: Can we apply the holonomic gradient descent method to SDP (SemiDefinite Programming) or SOCP (Second-Order Cone Programming) to obtain the optimal design matrix? 20/20

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived