Regression When the Predictors Are Images
|
|
- Maria Goodman
- 6 years ago
- Views:
Transcription
1 University of Haifa From the SelectedWorks of Philip T. Reiss May, 2009 Regression When the Predictors Are Images Philip T. Reiss, New York University Available at:
2 Regression When the Predictors Are Images Philip T. Reiss New York University reiss Wright State University Dayton, Ohio May 1, 2009 Joint work with Todd Ogden Research supported in part by grant number 5F31MH , NIMH
3 Simple linear regression: fit least-squares line x y x y 2
4 y x2 y x2 Multiple linear regression: fit least-squares (hyper)plane x1 x1 3
5 for i = 1,..., n, where Basic model equation for multiple linear regression y i = β 0 + β 1 x i β p 1 x i,p 1 + ε i y i is the ith subject s outcome, β 0 is the intercept of the true line (or plane...), x i1,..., x i,p 1 are subject i s values for predictor 1,..., p 1, β 1,..., β p 1 are coefficients or effects of predictor 1,..., p 1, ε i is an error term which in the nicest case is iid normal : 1. independent across subjects, 2. identically distributed for each subject, 3. normally distributed. Each of these 3 conditions can be relaxed. 4
6 Above system of n equations in p unknowns can be written in inner product form as where x = (1, x i1,..., x i,p 1 ) T matrix form as y i = x T i β + ε i, i = 1,..., n, and β = (β 0, β 1,..., β p 1 ) T, or in y = Xβ + ε where X is the n p matrix with ith row x i (the design matrix ). The least-squares estimate of β is then ˆβ = arg min β R p y Xβ 2. If rankx = p then we have the unique minimizer ˆβ = (X T X) 1 X T y. But if n < p, there are ordinarily infinitely many β R p such that y Xβ 2 = 0. So we generally seek the optimal β within some reasonable subset of R p. 5
7 A motivating example: Serotonin receptors in depression Major depressive disorder affects 9.5% of the U.S. population each year. Serotonin (5-HT), a neurotransmitter, is believed to play an important role in the disorder, mediated in part by distribution of 5-HT 1A receptors in the brain. 5-HT 1A receptor binding potential (BP), a measure of the receptors availability, can be mapped at each voxel (volume unit) of the brain via PET imaging and tracer kinetic modeling. Question: How can we relate these BP maps to a depression-related outcome such as Hamilton Depression Score (HAM-D)? 6
8 Idea: regress HAM-D (scalar) on BP (image) y i = α + s T i f + ε i 7
9 This can be thought of 1. as super-high-dimensional regression, 2. as regression with functional data (L 2 inner product) Many potential applications in psychiatry, including 1. predicting treatment response in depression 2. predicting progression from mild cognitive impairment to Alzheimer s disease 8
10 Road map Linear Generalized linear regression regression 1-D signal predictors D image predictors 3 9
11 An easier application: near-infrared spectroscopy!"#$%"&$&'(")$*+(,)-"!.#$/011(-(2,(3$&'(")$*+(,)-",5.2!674 #'$ #'% #'&!'#!'"!"##!$##!%##!&## "### ""## "$## 9:;;+<+-=+>1,5.2!674!#'##8 #'### #'##8 #'#!# #'#!8!"##!$##!%##!&## "### ""## "$## ()*+,+-./ ()*+,+-./012-34!,#$450*)6-(7$89% We model a vector y of n scalar responses as!3#$450*)6-(7$:89%!%;4<!!##!8# # 8# y = Xα + Sf + ε!"##!$##!%##!&## "### ""## "$## ()*+,+-./ and each column has mean zero?5+;;:=:+-/1;@-=/:5-?5+;;:=:+-/1;@-=/:5-!!##!8# # 8# where X is an n p covariate matrix (may just be a column of 1s) S is an n N matrix: ith row s i represents a signal predictor defined at points v 1,..., v N,!"##!$##!%##!&## "### ""## "$## ()*+,+-./ ε denotes iid errors Since n N, must reduce the dimension of S to solve for coefficient function f. 10
12 Previous approaches to solving for f: 1. Principal component regression (Massy, 1965) Replace model y = Xα + Sf + ε with y = Xα + SV q β + ε, where V q comprises the q leading columns of V, given the singular value decomposition S = UDV T ; take ˆf = V q ˆβ. 11
13 2. Penalized B-spline expansion Replace y = Xα + Sf + ε with y = Xα + SBβ + ε where the columns of B form a B-spline basis; take ˆf = B ˆβ, where (ˆα, ˆβ) minimize the penalized least squares criterion y Xα SBβ 2 + λβ T P β. P is chosen so that β T P β is a measure of the roughness of f = Bβ, e.g., R f (v) 2 dv (Cardot, Ferraty, and Sarda, 2003) or difference penalty (Marx and Eilers, 1999). The choice of λ > 0 governs the tradeoff between fidelity to the data and smoothness of the coefficient function (higher λ smoother). 12
14 Functional principal component regression Reiss and Ogden (2007) propose to combine the above two approaches to obtain a coefficient function estimate where ˆf = BV qˆζ B is N K with columns forming a B-spline basis as before; V q is a K q matrix whose columns are the leading columns of V, given the singular value decomposition SB = UDV T. This functional principal component regression (FPCR) modifies the above penalized least squares criterion y Xα SBβ 2 + λβ T P β by further restricting to the q leading principal components, to obtain y Xα SBV q ζ 2 + λζ T V T q P V q ζ. 13
15 The second dimension reduction SB SBV q is optimal in the following minimax sense. Proposition 1. Let UDV T be the singular value decomposition of the n K matrix Z, let U q be the matrix consisting of the first q < min(n, K) columns of U, and let D q be the q q upper left submatrix of D. Then M 0 = U q D q minimizes max Zw proj M Zw w R K, w =1 over all n q matrices M, where proj M denotes projection onto the column space of M. In other words: If we wish to replace an n K design matrix Z = SB with an n q design matrix M with q < K, the maximum perturbation in fitted values is minimized by taking M to be U q D q, which equals SBV q, the FPCR design matrix. 14
16 The q in Tuning parameter 1: Number of principal components y Xα SBV q ζ 2 + λζ T V T q P V q ζ can be chosen by multifold cross-validation: 1. Divide the data points (y i, x i, s i ) into, say, 5 validation sets. 2. For given q, and for k = 1,..., 5, remove the kth validation set and obtain estimates ˆα ( k) (q), ˆζ ( k) (q) with the remaining data (the training set ); use those estimates to obtain predicted values ŷ ( k) k (q) for the left-out outcomes; obtain prediction error y k ŷ ( k) k (q) 2 for the kth validation set. 3. Choose q such that P 5 k=1 y k ŷ( k) k (q) 2 is minimized. Alternatively, just choose q that seems large enough to capture the necessary detail. 15
17 Tuning parameter 2: Roughness penalty parameter λ GCV Too bumpy Too smooth Just right! Lambda Given q, the λ in y Xα SBV q ζ 2 + λζ T V T q P V qζ can be chosen by optimizing over λ a criterion function such as: 1. generalized cross-validation (GCV; Craven and Wahba, 1979) related to cross-validation, but does not require fitting the model repeatedly. 2. restricted maximum likelihood (REML; Ruppert, Wand, and Carroll, 2003) relies on connection with linear mixed models. 16
18 Reiss and Ogden (2009a) derived a 1-parameter family of functions h k (k 1) s.t.» d sgn dλ REML(λ)» d sgn GCV (λ) dλ = sgn[h 2 (λ) h 1 (λ)] and = sgn[h 2 (λ) h 3 (λ)]. In non-pathological cases, h 1 crosses h 2 (from smaller to larger) at a unique value ˆλ REML, whereas h 3 crosses h 2 (from larger to smaller) at a unique point ˆλ GCV. REML smooths more than GCV iff the latter crossing of h 2 precedes the former one. h functions GCV REML h1 h2 h Log(lambda) 17
19 Some asymptotic results Assume the outcomes are generated by the model y = α1 + S Bβ + ε where S = `s 1,..., s n T ; s i = (s i1,..., s in )T, i = 1, 2,..., which are i.i.d. random vectors with E(s 1j ) = 0 and E(s 4 1j ) < for each j = 1,..., N; ε is a vector of i.i.d. errors with mean zero and finite variance, independent of S ; and B is a fixed N K B-spline basis matrix. Let V q be the K q population principal component matrix whose columns are the eigenvectors of E(B T s 1 s T 1 B) corresponding to its leading eigenvalues ξ 1 >... > ξ q > 0, and let Ξ q = diag(ξ 1,..., ξ q ). Theorem 1. Suppose β = V qζ for some ζ = (ζ 1,..., ζ q ) T. If ˆβ n = V qˆζ denotes a q-component FPCR estimate for which λn is chosen to be o P (n 1/2 ), then n 1/2 (ˆβ n β) d Z 1 + Z 2, where Z 1 N K (0, σ 2 V qξ 1 q V T q ) and Z 2 N K (0, W ) for a K K matrix W not depending on σ 2. 18
20 Under mild assumptions, the λ n = o P (n 1/2 ) condition imposed in Theorem 1 is met if λ n is chosen by GCV or REML. Indeed, a stronger condition then holds: Theorem 2. Let λ n be the GCV or REML value associated with the q-component FPCR estimate. Assume that V T q P V q is nonsingular, and that f = Bβ with V T q β 0. Then there exists M > 0 such that P (λ n > M) 0 as n. 19
21 Consistency of choosing number of components by multifold CV: Theorem 3. Assume that f = BV qζ with ζ q 0. Suppose the number of FPCR components is chosen by multifold CV using D n divisions of the n observations into training and validation sets of size n t and n v = n n t, respectively, with λ n = o P (n 1/2 ). If n t, n v and D n = o P [(min{n t, n v }) 1/2 ], then for any positive integer q 1 < q, the q-component model will be chosen over the q 1 -component model with probability tending to 1 as n. 20
22 (a) Raw wheat spectra (b) Differenced wheat spectra log(1/r) Differenced log(1/r)! Wavelength (nm) Wavelength (nm) (c) Moisture: PCR (d) Moisture: FPCR!REML Coefficient function!100! Coefficient function!100! Wavelength (nm) Wavelength (nm) 21
23 Simulation study Given an n N signal matrix S (N-dimensional signal for each of n subjects), create a true coefficient function f R N and simulate outcomes y = Sf + ε, where ε consists of n iid normal errors. Then use FPCR to derive estimate ˆf. Two key performance metrics are estimation error ˆf f 2, prediction error ŷ E(y S) 2 = S ˆf Sf 2. 22
24 Table 1: Average Scaled Mean Squared Error of Prediction. Smooth function Bumpy function Wheat Gasoline Wheat Gasoline PBSE-GCV PBSE-REML PBSE-oracle PCR B-spline PCR FPCR C FPCR R -GCV FPCR R -REML FPCR R -oracle FPCR R -plug-in PLS B-spline PLS FPLS C FPLS R -GCV FPLS R -REML
25 Table 2: Mean of L 2 Norm of Error (Root Integrated Squared Error) in Estimating f. Smooth function Bumpy function Wheat Gasoline Wheat Gasoline PBSE-GCV PBSE-REML PBSE-oracle PCR B-spline PCR FPCR C FPCR R -GCV FPCR R -REML FPCR R -oracle FPCR R -plug-in PLS B-spline PLS FPLS C FPLS R -GCV FPLS R -REML
26 Linear Generalized linear regression regression 1-D signal predictors D image predictors 3 25
27 Generalized linear models Problem: The regression model y i = x T i β + ε i is not appropriate for certain types of outcome y i e.g., if y i is binary (0/1). Under the standard assumptions that ε i is independent of x i and has expectation 0, µ i E(y i x i ) = x T i β. This motivates the generalized linear model µ i = g 1 (x T i β) where g is an invertible link function. Key example: logistic regression. If y i is binary, µ i is simply p i P (y i = 1). With inverse link g 1 (t) = et, the above becomes 1+e t» p i = exp(xt i β) exp(x T 1 + exp(x T i β), i.e., y i Bernoulli i β) 1 + exp(x T i β). 26
28 FPCR for generalized linear models Reiss and Ogden (2009b) extended FPCR from the linear model µ = Xα + Sf to the generalized linear model g(µ) [g(µ 1 ),..., g(µ n )] T = Xα + Sf where g is an appropriate link function. Again we apply the restriction f = BV q ζ to obtain, e.g., the logistic model ( ) exp[(xα + SBV q ζ) i ] y i Bernoulli. 1 + exp[(xα + SBV q ζ) i ] 27
29 Linear Generalized linear regression regression 1-D signal predictors D image predictors 3 28
30 Extension to image predictors As for linear model with 1-D predictors, we seek to minimize y Xα SBV q ζ 2 + λζ T V T q P V q ζ, but must modify B and P. For the columns of B, we chose radial cubic B-splines (Saranli and Baykal, 1998) centered at each of an equally spaced grid of knots κ 1,..., κ K R 2. We chose P to yield the thin plate penalty given in two dimensions by ζ T V T q P V q ζ Z Z " «2 2 «f 2 2 «f 2 2 # f dv 1 dv 2. v 1 v 2 v 2 1 v
31 y i = α + s T i f + ε i Key scientific question re PET images: Where in the brain (at which voxels v) is f(v) significantly positive/negative? 30
32 This question can be approached through simultaneous confidence bands for the coefficient image. A 95% (pointwise) confidence interval for f(v) is an interval [L, U] (determined by the data) such that P (L f(v) U) =.95. A 95% simultaneous confidence band for f is given by functions ˆf L ( ), ˆf U ( ) such that P [ ˆf L (v) f(v) ˆf L (v) v] =.95. For a 1-D coefficient function this might look something like: Coefficient function Sig. positive Sig. negative
33 No analytic expression is available for ˆf L, ˆf U, so we pursue a bootstrapping approach (Mandel and Betensky, 2008): 1. Using, say, 999 random samples with replacement of n data points (y, x, s ) from the n observations, obtain bootstrap estimates ˆf 1,..., ˆf 999 of the coefficient function. 2. At each v, order the function estimates: ˆf (1)(v)... ˆf (999)(v). 3. [ ˆf (25)(v), ˆf (975)(v)] is a pointwise 95% confidence interval for f(v). 4. Q v [ ˆf (25)(v), ˆf (975)(v)] is in general not a simultaneous 95% confidence band for f; but the Cartesian product of, say, 99% pointwise intervals will be. 5. Given 95% simultaneous confidence bands [ ˆf L, ˆf U ] for f, the coefficient function is declared significantly positive at all v such that ˆf L (v) > 0. 32
34 Oversimplified example
35 More realistic example Function estimates cross over each other. Simultaneous band formed by Cartesian product of pointwise intervals from 2nd-smallest to 2nd-largest function: 34
36 Simulation studies We used 68 maps of binding potential of 5-HT 1A receptors obtained by Parsey et al. (2006) to perform two simulation studies: Study 1 Compare performance of FPCR with two other methods. Study 2 Assess how well the simultaneous inference procedure detects nonzero regions of f. 35
37 True coefficient image f for simulation studies
38 Simulation study 1: Performance comparisons We compared three methods for both linear and logistic regression. The methods all begin with restricting the coefficient function to a spline basis, but differ in what happens next. (a) Roughness penalty only referred to above as the penalized B-spline expansion. (b) Reduction to leading PCs only similar to FPCR, but without a roughness penalty (i.e., λ = 0). Thus the only smoothing parameter is the number of components, which we chose by 5-fold cross-validation from among the values (c) Leading PCs plus roughness penalty i.e., FPCR, with 35 PCs (accounting for 96% of the variation in SB). For both linear and logistic regression, we generated 100 outcome vectors with equally spaced R 2 ( proportion of explained variation ) values from.2 to.95, and computed each method s prediction error. 37
39 (a) Linear (b) Logistic Prediction error All components 1-10 components, λ=0 35 components Prediction error R^ R^2 Figure 1: (a) Linear regression results. (b) Logistic regression results. 38
40 Simulation study 2: Detecting significance For RL 2 =.5,.6,.7,.8,.9 (Menard, 2000), we simulated 20 sets of simulated binary outcomes " # exp(s T y i Bernoulli i f) 1 + exp(s T i f) (i = 1,..., 68), where s i denotes subject i s binding potential map (1 slice, 5778 voxels), and f = kf 0, where f 0 is the artificial coefficient image shown previously and k is chosen to attain the specified R 2 L. For each set of outcomes, we estimated f by logistic FPCR with 35 PCs, and formed simultaneous confidence bands from 1999 bootstrap samples. Using these bands, we determined how many times (out of 20) each voxel was deemed significantly positive (or negative). 39
41 R^2 = 0.5 R^2 = 0.6 R^2 = 0.7 R^2 = 0.8 R^2 = 0.9 Times (out of 20) found significant
42 f, ˆf, and simultaneous bands: 10 examples with R 2 L =.7 9 / 9 true, 2 false 9 / 9 true, 1 false 4 / 9 true, 0 false 8 / 9 true, 0 false 6 / 9 true, 0 false Coefficient function Coefficient function Coefficient function Coefficient function Coefficient function Voxel Voxel Voxel Voxel Voxel 5 / 9 true, 0 false 0 / 9 true, 0 false 5 / 9 true, 1 false 9 / 9 true, 0 false 8 / 9 true, 1 false Coefficient function Coefficient function Coefficient function Coefficient function Coefficient function Voxel Voxel Voxel Voxel Voxel 41
43 f, ˆf, and simultaneous bands: zooming in 9 / 9 true, 2 false 9 / 9 true, 1 false 4 / 9 true, 0 false 8 / 9 true, 0 false 6 / 9 true, 0 false Coefficient function Coefficient function Coefficient function Coefficient function Coefficient function Voxel Voxel Voxel Voxel Voxel 5 / 9 true, 0 false 0 / 9 true, 0 false 5 / 9 true, 1 false 9 / 9 true, 0 false 8 / 9 true, 1 false Coefficient function Coefficient function Coefficient function Coefficient function Coefficient function Voxel Voxel Voxel Voxel Voxel 42
44 Discussion Regressing scalar outcomes on entire images is a very challenging problem, but our method seems to do quite well at detecting salient (f 0) regions. Extension to 3D images will require tackling several practical issues. Another extension: locally adaptive smoothing parameter. Ongoing work (Todd Ogden and Yihong Zhao) uses wavelets rather than splines, and seeks a sparse representation of the coefficient function. 43
45 Thank you! 44
46 References [1] Cardot, H., Ferraty, F., and Sarda, P. (2003). Spline estimators for the functional linear model. Statistica Sinica 13, [2] Craven, P., and Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31, [3] Mandel, M., and Betensky, R. A. (2008). Simultaneous confidence intervals based on the percentile bootstrap approach. Computational Statistics and Data Analysis 52, [4] Marx, B. D., and Eilers, P. H. C. (1999). Generalized linear regression on sampled signals and curves: a P -spline approach. Technometrics 41, [5] Massy, W. F. (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association 60, [6] Menard, S. (2000). Coefficients of determination for multiple logistic regression analysis. The American Statistician 54, [7] Parsey, R. V., Oquendo, M. A., Ogden, R. T., Olvet, D. M., Simpson, N., Huang, Y., Van Heertum, R. L., Arango, V., and Mann, J. J. (2006). Altered serotonin 1A binding in major depression: A [carbonyl-c-11]way positron emission tomography study. Biological Psychiatry 59,
47 [8] Reiss, P. T., and Ogden, R. T. (2007). Functional principal component regression and functional partial least squares. Journal of the American Statistical Association 102, [9] Reiss, P. T., and Ogden, R. T. (2009a). Smoothing parameter selection for a class of semiparametric linear models. Journal of the Royal Statistical Society, Series B 71, [10] Reiss, P. T., and Ogden, R. T. (2009b). Functional generalized linear models with images as predictors. Biometrics, to appear. [11] Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge and New York: Cambridge University Press. [12] Saranli, A., and Baykal, B. (1998). Complexity reduction in radial basis function (RBF) networks by using radial B-spline functions. Neurocomputing 18,
Simultaneous Confidence Bands for the Coefficient Function in Functional Regression
University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available
More informationSUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING
Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare
More informationRegression with Signals and Images as Predictors
University of Haifa From the SelectedWorks of Philip T. Reiss 2006 Regression with Signals and Images as Predictors Philip T. Reiss, Columbia University Available at: https://works.bepress.com/phil_reiss/11/
More informationFunction-on-Scalar Regression with the refund Package
University of Haifa From the SelectedWorks of Philip T. Reiss July 30, 2012 Function-on-Scalar Regression with the refund Package Philip T. Reiss, New York University Available at: https://works.bepress.com/phil_reiss/28/
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationFocused fine-tuning of ridge regression
Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares
More informationNonparametric Small Area Estimation Using Penalized Spline Regression
Nonparametric Small Area Estimation Using Penalized Spline Regression 0verview Spline-based nonparametric regression Nonparametric small area estimation Prediction mean squared error Bootstrapping small
More informationTheoretical Exercises Statistical Learning, 2009
Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises
More informationInference After Variable Selection
Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationA NOTE ON A NONPARAMETRIC REGRESSION TEST THROUGH PENALIZED SPLINES
Statistica Sinica 24 (2014), 1143-1160 doi:http://dx.doi.org/10.5705/ss.2012.230 A NOTE ON A NONPARAMETRIC REGRESSION TEST THROUGH PENALIZED SPLINES Huaihou Chen 1, Yuanjia Wang 2, Runze Li 3 and Katherine
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationRestricted Likelihood Ratio Tests in Nonparametric Longitudinal Models
Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Short title: Restricted LR Tests in Longitudinal Models Ciprian M. Crainiceanu David Ruppert May 5, 2004 Abstract We assume that repeated
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationResampling-Based Information Criteria for Adaptive Linear Model Selection
Resampling-Based Information Criteria for Adaptive Linear Model Selection Phil Reiss January 5, 2010 Joint work with Joe Cavanaugh, Lei Huang, and Amy Krain Roy Outline Motivating application: amygdala
More informationFunctional SVD for Big Data
Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationVariable Selection and Model Choice in Survival Models with Time-Varying Effects
Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationPenalized Splines, Mixed Models, and Recent Large-Sample Results
Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationPackage Grace. R topics documented: April 9, Type Package
Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationLinear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation
More informationSemiparametric Methods for Mapping Brain Development
University of Haifa From the SelectedWorks of Philip T. Reiss May, 2012 Semiparametric Methods for Mapping Brain Development Philip T. Reiss Yin-Hsiu Chen Lan Huo Available at: https://works.bepress.com/phil_reiss/27/
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationAn Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.
An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationNonparametric Small Area Estimation via M-quantile Regression using Penalized Splines
Nonparametric Small Estimation via M-quantile Regression using Penalized Splines Monica Pratesi 10 August 2008 Abstract The demand of reliable statistics for small areas, when only reduced sizes of the
More informationVariable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting
Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,
More informationMultidimensional Density Smoothing with P-splines
Multidimensional Density Smoothing with P-splines Paul H.C. Eilers, Brian D. Marx 1 Department of Medical Statistics, Leiden University Medical Center, 300 RC, Leiden, The Netherlands (p.eilers@lumc.nl)
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationLeast Squares Approximation
Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationTowards a Regression using Tensors
February 27, 2014 Outline Background 1 Background Linear Regression Tensorial Data Analysis 2 Definition Tensor Operation Tensor Decomposition 3 Model Attention Deficit Hyperactivity Disorder Data Analysis
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationData analysis strategies for high dimensional social science data M3 Conference May 2013
Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional
More informationRegularization Methods for Additive Models
Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France
More informationReduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation
Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:storlie@lanl.gov Outline Reduction of Emulator
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationLONGITUDINAL FUNCTIONAL MODELS WITH STRUCTURED PENALTIES
Johns Hopkins University, Dept. of Biostatistics Working Papers 11-2-2012 LONGITUDINAL FUNCTIONAL MODELS WITH STRUCTURED PENALTIES Madan G. Kundu Indiana University School of Medicine, Department of Biostatistics
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationSmooth Common Principal Component Analysis
1 Smooth Common Principal Component Analysis Michal Benko Wolfgang Härdle Center for Applied Statistics and Economics benko@wiwi.hu-berlin.de Humboldt-Universität zu Berlin Motivation 1-1 Volatility Surface
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationThe Hodrick-Prescott Filter
The Hodrick-Prescott Filter A Special Case of Penalized Spline Smoothing Alex Trindade Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob Paige, Missouri University of Science
More informationA Confidence Region Approach to Tuning for Variable Selection
A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationRidge Regression. Flachs, Munkholt og Skotte. May 4, 2009
Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y
More informationExact Likelihood Ratio Tests for Penalized Splines
Exact Likelihood Ratio Tests for Penalized Splines By CIPRIAN CRAINICEANU, DAVID RUPPERT, GERDA CLAESKENS, M.P. WAND Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationFlexible penalized regression for functional data...and other complex data objects
University of Haifa From the SelectedWorks of Philip T. Reiss November, 2015 Flexible penalized regression for functional data...and other complex data objects Philip T. Reiss Available at: https://works.bepress.com/phil_reiss/41/
More informationFunctional Latent Feature Models. With Single-Index Interaction
Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationAn Introduction to Functional Data Analysis
An Introduction to Functional Data Analysis Chongzhi Di Fred Hutchinson Cancer Research Center cdi@fredhutch.org Biotat 578A: Special Topics in (Genetic) Epidemiology November 10, 2015 Textbook Ramsay
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationSmall Area Estimation for Skewed Georeferenced Data
Small Area Estimation for Skewed Georeferenced Data E. Dreassi - A. Petrucci - E. Rocco Department of Statistics, Informatics, Applications "G. Parenti" University of Florence THE FIRST ASIAN ISI SATELLITE
More informationTwo Applications of Nonparametric Regression in Survey Estimation
Two Applications of Nonparametric Regression in Survey Estimation 1/56 Jean Opsomer Iowa State University Joint work with Jay Breidt, Colorado State University Gerda Claeskens, Université Catholique de
More informationSmoothing parameter selection and outliers accommodation for smoothing splines
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 011, Dublin (Session CPS008) p.6037 Smoothing parameter selection and outliers accommodation for smoothing splines Osorio, Felipe Universidad
More informationGeneral Functional Concurrent Model
General Functional Concurrent Model Janet S. Kim Arnab Maity Ana-Maria Staicu June 17, 2016 Abstract We propose a flexible regression model to study the association between a functional response and multiple
More informationEffective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression
Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information
More informationSTK-IN4300 Statistical Learning Methods in Data Science
Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationCovariate Adjusted Varying Coefficient Models
Biostatistics Advance Access published October 26, 2005 Covariate Adjusted Varying Coefficient Models Damla Şentürk Department of Statistics, Pennsylvania State University University Park, PA 16802, U.S.A.
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More information