Smoothing and variable selection using P-splines

Size: px
Start display at page:

Download "Smoothing and variable selection using P-splines"

Transcription

1 Smoothing and variable selection using P-splines Irène Gijbels Department of Mathematics & Leuven Statistics Research Center Katholieke Universiteit Leuven, Belgium Joint work with Anestis Antoniadis, Anneleen Verhasselt, Sophie Lambert-Lacroix

2 Anestis Antoniadis XIVth International Biometrics Conference in Namur, Belgium, in July 1988 first reading: Nonparametric penalized maximum likelihood estimation of the intensity of a counting process, AISM, 1989 joint work on model selection using wavelet decomposition (with G. Grégoire) change point detection change point detection in hazard function (with B. MacGibbon) unfolding sphere size distributions using wavelets (with J. Fan) Workshop in honor of Anestis, Villard de Lans, March 2011 p.1

3 penalized wavelet monotone regression (with J. Bigot) smoothing non equispaced heavy noisy data with wavelets (with Jean-Michel Poggi) penalized likelihood regression for generalized linear models with nonquadratic penalties (with Mila Nikolova) variable selection in additive models and in varying coefficient models using P-splines (with Anneleen Verhasselt and Sophie Lambert-Lacroix) Workshop in honor of Anestis, Villard de Lans, March 2011 p.2

4 Seminal contributions to many areas wavelets and their applications intensity function estimation survival analysis and point processes inverse problems (in particular Poisson inverse problems) constraint estimation analysis of functional data development of statistical methods for microarray data excellent and very dynamic researcher extremely broad knowledge Workshop in honor of Anestis, Villard de Lans, March 2011 p.3

5 Some characteristics many services to the profession associate editor of several international journals serving in scientific evaluation boards for many years very supporting to young (moderate and old) researchers always helping with scientific advise a remarkable honesty and modesty an admirable personality; an example for many... Workshop in honor of Anestis, Villard de Lans, March 2011 p.4

6 THANK YOU!

7 Y : response variable Additive models: introduction (X 1,...,X d ) vector of d explanatory variables additive model Y = f 0 + d f j (X j ) + ε E (f j (X j )) = 0 j=1 ε random noise term; mean 0 and variance σ 2 f j unknown univariate functions often only a few components f j are different from 0 aim: to select and estimate the non-zero f j components Workshop in honor of Anestis, Villard de Lans, March 2011 p.6

8 Nonnegative garrote method Original nonnegative garrote method proposed by Breiman (1995) in a multiple linear regression model data (Y i, X i1,...,x id ) from Y i = β 0 + d β j X ij + ε i j=1 i = 1,...,n β OLS j ordinary least squares estimator for β j Workshop in honor of Anestis, Villard de Lans, March 2011 p.7

9 basic idea: the nng method shrinks the least squares estimators β OLS j shrinkage done via: c j βols j with c j 0 and a bound on d j=1 c j task : how to find the shrinkage factors c j? Workshop in honor of Anestis, Villard de Lans, March 2011 p.8

10 the nonnegative garrote shrinkage factors ĉ j are found by solving 1 n ( d ) 2 OLS (ĉ 1,...,ĉ d ) = argmin c1,...,c d Y i β 0 c 2 βols j j X ij i=1 j=1 d s.t. 0 c j (j = 1,...,d), c j s j=1 for given s, or equivalently (ĉ 1,...,ĉ d ) = argmin c1,...,c d s.t. 0 c j (j = 1,...,d) 1 2 n i=1 ( Y i β OLS 0 d j=1 ) 2 d c βols j j X ij + θ j=1 c j for given θ > 0 s > 0 and θ > 0; regularization parameters (see e.g. Xiong (2010)) Workshop in honor of Anestis, Villard de Lans, March 2011 p.9

11 the nonnegative garrote estimator of the regression coefficient β j is β NNG j = ĉ j βols j special case: orthogonal design, i.e. X X = I n ĉ j = ( 1 θ ( β OLS j ) 2 ) + z + = max(z, 0) the larger θ, the stronger the shrinkage effect Workshop in honor of Anestis, Villard de Lans, March 2011 p.10

12 β NNG θ=0 θ=0.01 θ=0.05 θ= β OLS Figure 1: Shrinkage effect of the nonnegative garrote for different θ s Workshop in honor of Anestis, Villard de Lans, March 2011 p.11

13 Relation with other estimation methods LASSO Ridge : ( β Lasso 1,..., s.t. d j=1 β j s ( β Ridge 1,..., s.t. d j=1 β2 j s β Lasso d ) = argmin β n i=1 ( Y i β 0 d j=1 β jx ij ) 2 Ridge ( β d ) = argmin n β i=1 Y i β 0 ) 2 d j=1 β jx ij identity nonnegative garrote lasso ridge Least Squares Estimate Workshop in honor of Anestis, Villard de Lans, March 2011 p.12

14 More relations with (other) thresholding rules (see, e.g., literature on wavelet methods, work by Anestis Antoniadis) ) hard-thresholding rule δλ H 0 if β j λ ( βj = β j if β j > λ ) 0 if β j λ soft-thresholding rule δλ S ( βj = β j λ if βj > λ β j + λ if βj < λ hard-thresholding (a discontinous function): keep or kill rule soft-thresholding (a continuous function): shrink or kill rule Bruce & Gao (1996) and Marron, Adak, Johnstone, Newmann & Patil (1998),..., Gao (2008),... Workshop in honor of Anestis, Villard de Lans, March 2011 p.13

15 another thresholding rule Antoniadis & Fan (2001) suggested the SCAD (Smoothed Clipped Absolute Deviation) thresholding rule ) ( ) sign ( βj max 0, β j λ if β j 2λ ) ) δλ SCAD (a 1) β ( βj = j aλsign ( βj if 2λ < β j aλ a 2 β j if β j > aλ where a > 2 this is also a shrink or kill rule (a piecewise linear function) this rule does not over-penalize large values of β j and hence does not create excessive bias when the regression coefficients are large Antoniadis & Fan (2001), based on a Bayesian argument, have recommended to use the value a = 3.7 Workshop in honor of Anestis, Villard de Lans, March 2011 p.14

16 thresholding functions δ λ Hard-thresholding Soft-thresholding x NNG-thresholding x SCAD x x Workshop in honor of Anestis, Villard de Lans, March 2011 p.15

17 Functional nonnegative garrote method extension of the nng to additive models: see Cantoni, Fleming & Ronchetti (2011) and Yuan (2007) start with an initial estimator f init j (X j ) of f j (X j ) the nonnegative garrote shrinkage factors are then found via ( {1 min n c1,...,c d 2 i=1 Y i s.t. 0 c j (j = 1,...,d) f init 0 d j=1 c j ) 2 init f j (X ij ) + θ d j=1 c j } the nonnegative garrote estimate of f j is NNG init f j ( ) = ĉ j f j ( ) Cantoni, Fleming & Ronchetti (2011): use smoothing splines for the initial estimator using P-splines... Workshop in honor of Anestis, Villard de Lans, March 2011 p.16

18 univariate P-spline estimation P-splines, introduced by Eilers & Marx (1996), in the univariate nonparametric smoothing context Y i = f(x i ) + ε i for i = 1,...,n P-splines are an extension of regression splines with a penalty on the coefficients of adjacent B-splines (X i, Y i ), for i = 1,...,n, with X i [0, 1] IR regression spline model: approximate f(x) with m α j B j (x; q) where {B j ( ; q) : j = 1,...,K + q = m} is the q-th degree B-spline basis, using normalized B-splines such that j B j(x; q) = 1, with K + 1 equidistant knot points t 0 = 0, t 1 = 1 K,...,t K = 1 in [0, 1] α = (α 1,...,α m ) : unknown column vector of regression coefficients j=1 Workshop in honor of Anestis, Villard de Lans, March 2011 p.17

19 penalized least squares estimator α is the minimizer of S(α) = n i=1 ( Y i m j=1 λ > 0 : smoothing parameter ) 2 m α j B j (X i ; q) + λ j=k+1 ( k α j ) 2 the differencing operator: k α j = k t=0 ( 1)t( k t) αj t, with k IN examples: k = 1 : 1 α j = α j α j 1 k = 2 : 2 α j = α j 2α j 1 + α j 2 rewriting in matrix-notation: S(α) = (Y Bα) (Y Bα) + λα D kd k α the elements B ij of B ( IR n m ) are B j (X i ; q) D k ( IR (m k) m ) : matrix representation of the kth order differencing operator k Workshop in honor of Anestis, Villard de Lans, March 2011 p.18

20 Additive model: Nonnegative garrote with P-splines initial estimator additive model Y = f 0 + d f j (X j ) + ε E (f j (X j )) = 0 j=1 key ingredients to prove the consistency of the nonnegative garrote with P-splines (i) consistency result for (univariate) P-splines (Claeskens, Krivobokova & Opsomer (2009)) (ii) extension of a univariate smoothing estimator to additive models via backfitting (rely on results from Horowitz, Klemelä & Mammen (2006)) (iii) on a consistency result for the functional nonnegative garrote (e.g. Yuan (2007)) Workshop in honor of Anestis, Villard de Lans, March 2011 p.19

21 Consistency of nonnegative garrote with P-splines notation : f j = (f j (X 1j ),, f j (X nj )) Theorem: Under some assumptions, and if θ n tends to 0 such that κ n = n (q+1) 2q+3 = o( θ n ), then (given X ij = x ij ) (1). P( f j NNG = 0) 1 for any j such that f j = 0 1 (2). sup j n f (( NNG j f j 2 2 = O θ ) 2 ) P n in other words: the nonnegative garrote method with P-splines is variable selection consistent (1) estimation consistent (2) Workshop in honor of Anestis, Villard de Lans, March 2011 p.20

22 Other selection methods COSSO (Component Selection and Smoothing Operator, Lin & Zhang (2006)) ACOSSO (Adaptive Component Selection and Smoothing Operator, Storlie, Bondell, Reich & Zhang (2010)) APSO (Adaptive P-splines Selection Operator, Antoniadis et al. (2011)) Workshop in honor of Anestis, Villard de Lans, March 2011 p.21

23 Varying coefficient models: introduction (Fan and Zhang (2000),...) Y (t) = β 0 (t) + d X (p) (t)β p (t) + ε(t) = X(t) β(t) + ε(t) p=1 Y (t) is the response at time t (t T = [0, T]) X(t) = (X (0) (t),...,x (d) (t)) covariate vector at time t with X (0) (t) 1 β(t) = (β 0 (t),...,β d (t)) the vector of coefficients at time t β 0 (t) is the baseline effect ε(t) a mean zero stochastic process at time t Workshop in honor of Anestis, Villard de Lans, March 2011 p.22

24 longitudinal data, i.e. samples with n independent subjects or individuals each measured repeatedly over a time period the j-th measurement for subject i of (t, Y (t),x(t)) is denoted by (t ij, Y ij,x ij ) 1 i n, 1 j N i N i is the number of repeated measurements of subject i t ij is the measurement time, Y ij is the observed response at time t ij and X ij = (X (0) ij,...,x(d) ij ) N = n N i is the total number of observations i=1 Workshop in honor of Anestis, Villard de Lans, March 2011 p.23

25 P-spline estimation in varying coefficient models (see Lu, Zhang & Zhu (2008), Wang & Huang (2008),...) suppose: each unknown function β p (t), p = 0,...,d, can be approximated by a B-spline basis expansion β p (t) = m p l=1 B pl (t; q p )α pl where {B pl ( ; q p ) : l = 1,...,K p + q p = m p } is the q p -th degree B-spline basis with K p + 1 equidistant knots for the p-th component Workshop in honor of Anestis, Villard de Lans, March 2011 p.24

26 the P-spline estimates of the regression coefficients α pl are obtained by minimizing S(α) with respect to α = (α 0,...,α d ) IR dim 1, where α p = (α p1,...,α pmp ) and dim = p m p: S(α) = n i=1 1 N i N i j=1 ( Y ij d p=0 m p l=1 X (p) ij B pl(t ij ; q p )α pl ) 2 + d p=0 λ p α pd k p D kp α p k p is the differencing order for the p-th component λ p are the smoothing parameters Workshop in honor of Anestis, Villard de Lans, March 2011 p.25

27 S(α) = = n i=1 1 N i N i j=1 ( Y ij d p=0 m p l=1 X (p) ij B pl(t ij ; q p )α pl ) 2 + d n (Y i U i α) W i (Y i U i α) + αq λ α i=1 p=0 λ p α pd k p D kp α p Y i = (Y i1,..., Y ini ) B 01 (t; q 0 )... B 0m0 (t; q 0 ) B(t) = B d1 (t; q d )... B dmd (t, q d ) U ij = X ij B(t ij) IR 1 dim U i = (U i1,...,u in i ) IR N i dim ( ) W i = diag N 1 i,..., N 1 i IR N i N i N 1 i on the diagonal) (a diagonal matrix with N i times Q λ = diag(λ 0 D k 0 D k0,..., λ d D k d D kd ) IR dim dim (a block diagonal matrix with the matrices λ p D k p D kp on the diagonal) Workshop in honor of Anestis, Villard de Lans, March 2011 p.26

28 S(α) = n (Y i U i α) W i (Y i U i α) + αq λ α i=1 introducing further matrix notations: Ỹ Ũα αq λ α Y = (Y 1,...Y n) IR N 1 W = diag(w i ) i=1,...,n IR N N Ỹ = W 1/2 Y U = [U 0,...,U d ] Ũ = W 1/2 U Workshop in honor of Anestis, Villard de Lans, March 2011 p.27

29 consistency results are proved for the case that the number of knots K p + 1 (and thus m p = K p + m p ) grows with n β p ( ) is not a spline function itself, but can be approximated by a spline function theoretical results consistency result ( ( ) ) q/(2q+1) 1 β p (t) β p (t) L2 = O n P n 2 i=1 1 N i asymptotic normality Workshop in honor of Anestis, Villard de Lans, March 2011 p.28

30 Nonnegative garrote selection method the nonnegative garrote shrinkage factors ĉ = (ĉ 0,...,ĉ d ) are obtained from the optimization problem min c0,...,c d n i=1 1 N i d j=1 s.t. 0 c p (p = 0,...,d) ( Y ij ) 2 d p=0 X(p) ij c p β p init d (t ij ) + θ p=0 c p β init p ( ) : initial estimator for the regression coefficient function β p ( ) θ > 0 is a regularization parameter we use the P-spline estimator as an initial estimator Workshop in honor of Anestis, Villard de Lans, March 2011 p.29

31 some more matrix notations: min c Ỹ Zc θ d p=0 c p s.t. 0 c p (p = 0,...,d) where Y = (Y 1,...Y n) IR N 1 W = Ỹ = diag(w i ) i=1,...,n IR N N W 1/2 Y z (p) i = (X (p) i1,...,x(p) in i ) diag( β init p (t ij )) j=1,...,ni IR 1 N i Z p = (z (p) 1,...,z(p) n ) IR N 1 Z = [Z 0,...,Z d ] Z = W 1/2 Z c = (c 0,...,c d ) Workshop in honor of Anestis, Villard de Lans, March 2011 p.30

32 the P-spline estimator f p (t) = X (p) (t) β p (t) for the p-th component f p (t) = X (p) (t)β p (t) is consistent it can be shown that the nonnegative garrote estimator with the P-spline estimator as initial estimator for β p (t) f p NNG (t) = ĉ p fp (t) is estimation consistent variable selection consistent other selection methods: Adaptive P-spline Selection Operator (APSO)... Workshop in honor of Anestis, Villard de Lans, March 2011 p.31

33 example: CD4 data example the data are a subset from the Multicenter AIDS Cohort Study (Kaslow et al. (1987)) contain repeated measurements of physical examinations, laboratory results, CD4 cell counts and CD4 percentages of 283 homosexual men who became HIV-positive between 1984 and 1991 unequal numbers of repeated measurements and different measurement times for each individual aim: try to evaluate the effects of cigarette smoking, pre-hiv infection CD4 cell percentage and age at HIV infection on the mean CD4 percentage after infection the number of repeated measurements ranged from 1 to 14, with a median of 6 and mean of 6.57 Workshop in honor of Anestis, Villard de Lans, March 2011 p.32

34 the number of distinct time points was 59 covariates: X (1) i the smoking status of the i-th individual (1 or 0 if the individual ever or never smoked cigarettes) X (2) i X (3) i the centered age at HIV infection for the i-th individual the centered pre-infection CD4 percentage Workshop in honor of Anestis, Villard de Lans, March 2011 p.33

35 varying coefficient model for Y ij Y ij = β 0 (t ij ) + 3 β p (t ij ) + ε ij p=1 X(p) i β 0 (t) is the baseline CD4 percentage, represents the mean CD4 percentage t years after the HIV infection for a nonsmoker with average pre-infection CD4 percentage and average age at infection Table 1: Aids data. Summary parameters. method NS RSS ngp APSO Workshop in honor of Anestis, Villard de Lans, March 2011 p.34

36 36 34 ngp APSO ngp APSO baseline CD time since infection coefficient of smoking status time since infection (a) (b) ngp APSO ngp APSO coefficient of age coefficient of pre infection CD time since infection time since infection (c) (d) Figure 2: Aids data. Fitted (a) baseline effect; (b) coefficient of smoking status; (c) coefficient of age at HIV infection; (d) coefficient of pre-infection CD4. Workshop in honor of Anestis, Villard de Lans, March 2011 p.35

37 70 65 pre infection CD4= pre infection CD4=15 pre infection CD4=69 60 CD4 percentage time since infection Figure 3: Aids data. Fitted CD4 percentage for 3 pre-infection CD4 percentages. Workshop in honor of Anestis, Villard de Lans, March 2011 p.36

38 Grouped regularization methods Y (t) = β 0 (t) + d X (p) (t)β p (t) + ε(t) = X(t) β(t) + ε(t) p=1 approximation in terms of a basis of smooth functions β p (t) L p l=1 B (p) l (t)γ pl (L p large) as before, introducing the appropriate matrix notations n i=1 1 N i N i j=1 ( Y i (t ij ) p L k k=1 l=1 γ k,l X (k) i (t ij )B (k) l (t ij ) ) 2 Ỹ Zγ 2 Z : dimension N ( d p=0 L p) γ : dimension ( d p=0 L p) 1 2 Workshop in honor of Anestis, Villard de Lans, March 2011 p.37

39 grouped Lasso regularization minimize 1 2n Ỹ Zγ λ d w p γ p 2 p=0 w p = L p with respect to the vector of parameters γ = (γ 0,...,γ d ) defining p λ (v) = λv, for v 0, this can be written as minimize 1 2n Ỹ Zγ d ( ) p λ wp γ p 2 p=0 w p = L p Liu and Zhang (2008) Workshop in honor of Anestis, Villard de Lans, March 2011 p.38

40 grouped SCAD regularization the SCAD penalty p λ (v), for v 0, is λv if 0 v λ p λ (v) = v2 2aλv + λ 2 if λ < v < aλ 2(a 1) (a + 1)λ 2 if v aλ 2 grouped SCAD procedure: minimize 1 2n Ỹ Zγ d p λ (ω p γ p 2 ) p=0 w p = L p with respect to the vector of parameters γ Workshop in honor of Anestis, Villard de Lans, March 2011 p.39

41 possible approach a Taylor expansion of p λ (v) for v around v 0 gives p λ (v) p λ (v 0 ) p λ (v 0) v 0 (v 2 v 2 0) (see Fan and Li (2001)) this leads to solving a Ridge-regression type of problem (restricted to d n < n) Workshop in honor of Anestis, Villard de Lans, March 2011 p.40

42 first grouped SCAD regularization procedure minimize 1 2n Ỹ Zγ d p λ (ω p γ p 2 ) p=0 w p = L p with respect to the vector of parameters γ second grouped SCAD regularization procedure minimize with respect to γ 1 2n Ỹ Zγ d p λ ( γ p 1 ) p=0 algorithm for solving this optimization problem: in Breheny & Huang (2009) Workshop in honor of Anestis, Villard de Lans, March 2011 p.41

43 grouped Bridge regularization penalty function, for v > 0, p λ (v) = λ v q with0 < q < 1 grouped Bridge approach: minimizing the objective function 1 2n Ỹ Zγ d p λ (ω p γ p 2 ) p=0 an algorithm for solving this optimization problem: from Breheny & Huang (2009) Workshop in honor of Anestis, Villard de Lans, March 2011 p.42

44 Simulation studies comparison of 5 methods of grouped regularization methods gbridge: grouped Bridge (q = 1/2); local coordinate descent algorithm from Breheny & Huang (2009) gscad: grouped SCAD; idem gmcp: grouped MCP; idem glasso 1: grouped Lasso; idem glasso 2: grouped Lasso; implemented by Meier, van de Geer & Bühlman (2008) Workshop in honor of Anestis, Villard de Lans, March 2011 p.43

45 tuning parameter λ chosen by a BIC-type of criterion: ( RRSλ log n i=1 N i ) + n log( N i ) i=1 n i=1 N i df λ where RSS λ is the residual sum of squares df λ is the number of nonzero coefficients of γ Workshop in honor of Anestis, Villard de Lans, March 2011 p.44

46 simulation model (from Huang, Wu & Zhou (2002) and Wang, Li & Huang (2008)) 23 Y i (t ij ) = β 0 (t ij )+ p=1 β p (t ij )X (p) i (t ij )+ε i (t ij ), i = 1,...,n j = 1,...,Ñ intercept term and the three true relevant variables: ( ) ( ) πt π(t 25) β 0 (t) = sin β 1 (t) = 2 3 cos β 2 (t) = 6 0.2t β 3 (t) = 4 + (20 t) t [1, 30] remaining coefficients: β p (t) = 0, p = 4,...,23 time points t ij given by 1, 2,...,30 (Ñ = 30) and n = 100 Workshop in honor of Anestis, Villard de Lans, March 2011 p.45

47 simulation of three relevant variables X (k) i (t), k = 1,...,3: at any point t, the variable X (1) i (t) is sampled uniformly from [t/10, 2 + t/10] conditioning on X (1) i (t), the variable X (2) i (t) is centered Gaussian with variance given by (1 + X (1) i (t))/(2 + X (1) i (t)) the variable X (3) i (t) is independent of X (1) i and X (2) i and is a Bernoulli random variable with success rate equal to 0.6 the irrelevant variables X (k) i, k = 4,...,23 are paths of centered Gaussian process with covariance function Cov(X (k) i (t), X (k) i (s)) = 4 exp( t s ) the irrelevant variables are independent between them and of the others three first variables Workshop in honor of Anestis, Villard de Lans, March 2011 p.46

48 three levels of noise for the random error: σ = 1, 1.25 and 2 corresponds to signal-to-noise ratio (SNR) : 6.39, 5.11 and 3.19 SNR is defined by γ T Z T Zγ/N for each simulated data set: use cubic splines with five equidistant internal knots number of simulations: 500 Workshop in honor of Anestis, Villard de Lans, March 2011 p.47

49 reported criteria: mean value of the tuning parameter λ the average number of variables selected the average number of truly zero variables that where selected (false positives) the average number of truly nonzero variables that where not selected (false negatives) the mean and standard deviation of the model error: ( γ γ) T Z T Z( γ γ)/n Workshop in honor of Anestis, Villard de Lans, March 2011 p.48

50 Table 2: Selection model ability λ S FP FN ME σ = 1 gbridge (0.0033) gscad (0.0142) gmcp (0.0124) glasso (0.0066) glasso (0.0497) σ = 2 gbridge (0.0137) gscad (0.0316) gmcp (0.0317) glasso (0.0407) glasso (0.1209) Workshop in honor of Anestis, Villard de Lans, March 2011 p.49

51 remarks from this simulation study: for all signal-to-noise ratios, gbridge and glasso 1 lead to the best result for the selection ability and for the model error compared to the other methods the gbridge method is better in model error while glasso 1 method is better in selection ability the gscad and gmcp procedures are not very good in selection ability: the number of false positives is rather high the implementation of group lasso (glasso 2) gives relatively correct result in selection model but leads to very bad result in term of model error Workshop in honor of Anestis, Villard de Lans, March 2011 p.50

52 typical performance of the estimators of the four first coefficients for a signal-to-noise ratio = 1.25 Model error, SNR = gbridge gscad gmcp glasso 1 Model error, SNR = gbridge gscad gmcp glasso 1 Model error, SNR = gbridge gscad gmcp glasso 1 Figure 4: Workshop in honor of Anestis, Villard de Lans, March 2011 p.51

53 THANK YOU!

Quantile regression in varying coefficient models

Quantile regression in varying coefficient models Quantile regression in varying coefficient models Irène Gijbels KU Leuven Department of Mathematics and Leuven Statistics Research Centre Belgium July 6, 2017 Irène Gijbels (KU Leuven) ICORS 2017, Wollongong,

More information

Penalized estimation in additive varying coefficient models using grouped regularization

Penalized estimation in additive varying coefficient models using grouped regularization Noname manuscript No. (will be inserted by the editor) Penalized estimation in additive varying coefficient models using grouped regularization A. Antoniadis I. Gijbels S. Lambert-Lacroix Received: date

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:storlie@lanl.gov Outline Reduction of Emulator

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Regularization Methods for Additive Models

Regularization Methods for Additive Models Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Nonconvex penalties: Signal-to-noise ratio and algorithms

Nonconvex penalties: Signal-to-noise ratio and algorithms Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

CONTRIBUTIONS TO PENALIZED ESTIMATION. Sunyoung Shin

CONTRIBUTIONS TO PENALIZED ESTIMATION. Sunyoung Shin CONTRIBUTIONS TO PENALIZED ESTIMATION Sunyoung Shin A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Comments on \Wavelets in Statistics: A Review by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill Comments on \Wavelets in Statistics: A Review" by A. Antoniadis Jianqing Fan University of North Carolina, Chapel Hill and University of California, Los Angeles I would like to congratulate Professor Antoniadis

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Ensemble estimation and variable selection with semiparametric regression models

Ensemble estimation and variable selection with semiparametric regression models Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

UNIVERSITETET I OSLO

UNIVERSITETET I OSLO UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Outlier Detection and Robust Estimation in Nonparametric Regression

Outlier Detection and Robust Estimation in Nonparametric Regression Outlier Detection and Robust Estimation in Nonparametric Regression Dehan Kong Howard Bondell Weining Shen University of Toronto University of Melbourne University of California, Irvine Abstract This paper

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

arxiv: v1 [stat.me] 4 Oct 2013

arxiv: v1 [stat.me] 4 Oct 2013 Monotone Splines Lasso Linn Cecilie Bergersen, Kukatharmini Tharmaratnam and Ingrid K. Glad Department of Mathematics, University of Oslo arxiv:1310.1282v1 [stat.me] 4 Oct 2013 Abstract We consider the

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors Patrick Breheny Department of Biostatistics University of Iowa Jian Huang Department of Statistics

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Variable Selection for Nonparametric Quantile. Regression via Smoothing Spline ANOVA

Variable Selection for Nonparametric Quantile. Regression via Smoothing Spline ANOVA Variable Selection for Nonparametric Quantile Regression via Smoothing Spline ANOVA Chen-Yen Lin, Hao Helen Zhang, Howard D. Bondell and Hui Zou February 15, 2012 Author s Footnote: Chen-Yen Lin (E-mail:

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

COMMENTS ON «WAVELETS IN STATISTICS: A REVIEW» BY A. ANTONIADIS

COMMENTS ON «WAVELETS IN STATISTICS: A REVIEW» BY A. ANTONIADIS J. /ta/. Statist. Soc. (/997) 2, pp. /31-138 COMMENTS ON «WAVELETS IN STATISTICS: A REVIEW» BY A. ANTONIADIS Jianqing Fan University ofnorth Carolina, Chapel Hill and University of California. Los Angeles

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Statistica Sinica 23 (2013), 929-962 doi:http://dx.doi.org/10.5705/ss.2011.074 VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Lee Dicker, Baosheng Huang and Xihong Lin Rutgers University,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Ratemaking application of Bayesian LASSO with conjugate hyperprior Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Learning with Sparsity Constraints

Learning with Sparsity Constraints Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier

More information

Math 533 Extra Hour Material

Math 533 Extra Hour Material Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared

More information

Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study

Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study Anestis Antoniadis, Jeremie Bigot Laboratoire IMAG-LMC, University Joseph Fourier, BP 53, 384 Grenoble Cedex 9, France. and

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Dhruv B. Sharma, Howard D. Bondell and Hao Helen Zhang Abstract Statistical procedures for variable selection

More information