Aspects of Feature Selection in Mass Spec Proteomic Functional Data
|
|
- Felix Ferguson
- 5 years ago
- Views:
Transcription
1 Aspects of Feature Selection in Mass Spec Proteomic Functional Data Phil Brown University of Kent Newton Institute 11th December 2006 Based on work with Jeff Morris, and others at MD Anderson, Houston including Keith Baggerly, Kevin Coombes and Jim Griffin (Warwick) Typeset by FoilTEX 1
2 Plan of Talk Background to Proteomics data: aims Peak detection versus functional modelling Use of wavelets and linear (mixed) model Slab and Spike shrinkage and selection Forms of MCMC posterior inference Scale mixtures of normal as priors Algorithms: MCMC/MAP Typeset by FoilTEX 2
3 Issues responses v factor: What is random? What is fixed? Direction of causation? aims: prediction/discrimination or explanation? Typeset by FoilTEX 3
4 Mice SELDI-TOF Cancer data (MD Anderson) Spectra of serum extracted from mice and processed by Mass Spectroscopy Time of Flight. Each biological sample characterised by experimental conditions: one of 2 cancer cell lines, A375P or PC3MM2 In one of 2 organs, either brain or lung 2 spectra per mouse, one low or one high intensity laser settings, Y Spectra and X Factors: 16 mice, 32 spectra Typeset by FoilTEX 4
5 Natural to model spectra as responses to experimental factors part of spectrum discretised points between 2000 and Daltons, response as curve or 7985 dimensional discretised. Typeset by FoilTEX 5
6 Typeset by FoilTEX 6
7 Typeset by FoilTEX 7
8 Strategies for Analysis Preprocess to remove baseline differences, to normalise and align (see for example Bioconductor packages), interpolated to 2000 grid, equal spacing on time scale. Afterwards commonly find peaks by some form of signal to noise evaluation Mainstream approach would reverse direction of causation by using logistic or other regression on peak intensities Our approach rather to work with modelling of spectra by use of waveletsleaves open possibility of later identification of peaks, and bayes theorem for discrimination Typeset by FoilTEX 8
9 Functional Mixed models For ith sample spectrum Y i (t), i = 1,..., n thought of as a continuous function of time, t Y i (t) = p X il B l (t) + l=1 m Z ik U k (t) + E i (t) k=1 or in reality as a discretised version in terms of the set n observations on a grid of T time points, matrix Y, n T Y = XB + ZU + E Rows of U are iid MV N T (0, Q) Rows of E are iid MV N T (0, S) Typeset by FoilTEX 9
10 T T matices Q, S to be specified. Orthogonally transform to wavelets by applying the Discrete Wavelet Transform (DWT) to each sample (rows of Y ) Y W = XBW + ZUW + EW where W is an orthogonal matrix. The model becomes Y = XB + ZU + E Here also Q W QW = Q and S W SW = S which we take to be diagonal. Typeset by FoilTEX 10
11 Typeset by FoilTEX 11
12 Typeset by FoilTEX 12
13 Typeset by FoilTEX 13
14 Prior for the transformed fixed effects We employ a slab and spike prior here to nonlinearly shrink towards zero: for the lth fixed effect, l = 1,..., p at scale j and location k : B ljk = γ ljk Normal(0, τ ljk ) + (1 γ ljk )δ 0 Here γ ljk Bernoulli(π lj ) that is for the lth fixed effect γ, the probability of acceptance is allowed to depend on the the scale but not the locations within the scale. Typeset by FoilTEX 14
15 The slab and spike prior may be thought of as a particular extreme case of scale mixtures of normals : A mixture on variance of Normal π(β l ) = N(β l 0, ψ l ) G(dψ l ) (1) G(.) weights variances ψ- high probability of small variance. More later. Analysis is at the wavelet transformed level, using MCMC with empirical Bayes plug-ins for the hyperparameters of shrinkage, transforming back with Inverse DWT to the original parameters for posterior inference. Typeset by FoilTEX 15
16 Typeset by FoilTEX 16
17 Typeset by FoilTEX 17
18 Typeset by FoilTEX 18
19 Typeset by FoilTEX 19
20 Typeset by FoilTEX 20
21 Typeset by FoilTEX 21
22 Typeset by FoilTEX 22
23 Typeset by FoilTEX 23
24 General Scale mixtures of normal The slab and spike prior is an extreme example of a scale mixture of normals. Suppose we wish to replace the slab and spike by a scale mixture of normals that concentrates mass around zero but also has fat tails so that any really sizeable coefficient is not shrunk too much. We hope to use such priors to speed up searches for a good model by concentrating on modes rather than full MCMC. We look for a prior whose negative log provides a suitable penalty to add to negative log likelihoods of a variety of models- suitable for logistic discrimination (eg modelling disease given spectrum). Typeset by FoilTEX 24
25 Generalised linear model for many variables Strategy minimise: loglik(β) log[π(β)] For example this could be the logistic likelihood using wavelets as explanatory variables. Continuous mixture on variance of Normal π(β i ) = N(β i 0, ψ i ) G(dψ i ) (2) G(.) weights variances ψ- high probability of small variance. Typeset by FoilTEX 25
26 Normal Variance Mixtures The mean-zero double exponential distribution, DE(0, 1/γ) with probability density function 1 2γ exp{ β /γ}, < β <, 0 < γ < ( 1 is defined by an exponential mixing distribution, Ex ), with probability density 2γ 2 function g(ψ i ) = 1 { 2γ 2exp ψ i /[2γ 2 ] }. (3) Typeset by FoilTEX 26
27 The normal-jeffreys (NJ) prior distribution arises from the improper hyperprior (Kiiveri/Figueiredo) g(ψ i ) 1 ψ i, (4) which in turn induces an improper prior for β i of the form π(β i ) 1 β i. The prior has an infinite spike at β i = 0, a feature also of the penalised likelihood or posterior which is consequently unnormalisable. Typeset by FoilTEX 27
28 Normal Exponential Gamma (NEG) Mixing on the scale of the exponential distribution. A gamma mixture gives g(ψ i ) = λ γ 2(1 + ψ i/γ 2 ) (λ+1) 0 < λ, γ <. (5) The density of the marginal distribution of β i can be expressed as π(β i ) = λ 2 λ Γ(λ + 1/2) exp π γ { 1 4 β 2 } ( ) β γ 2 D 2(λ+1/2) γ (6) where D(.) is the parabolic cylinder function. heaviness of the tails. For large β i γ The parameter λ controls the ( ) (2λ+1) βi π(β i ) c. γ Typeset by FoilTEX 28
29 The case λ = 0.5 corresponds to the quasi-cauchy slab of Johnstone and Silverman (AS 2005) who focus on the median of the posterior distribution. This same prior of J&S coincides with univariate special case of the robustness prior of Berger, 1985, Jeffreys(1939,3rd Ed 1961) also has desiderata in hypothesis testing which lead to a prior with Cauchy tails. Typeset by FoilTEX 29
30 Typeset by FoilTEX 30
31 Shrinkage of β Define the penalty function p(β) = log[π(β)] The penalised MLE β and the MLE ˆβ satisfy ˆβ β σ 2 / n i=1 x2 i = sign( β)p ( β ) shows that the amount of shrinkage is directly controlled by the derivative of the penalty function. Typeset by FoilTEX 31
32 Jeffreys prior no shrinkage for large β (unbiased) Oracle property of Fan and Li (2001, JASA) Typeset by FoilTEX 32
33 Typeset by FoilTEX 33
34 Parameter updates in n rather than k dimensions Use singular value decomposition of X(n k) : X = UDV T If γ = V T β then we may update the distribution of this, simulate with MCMC in n dimensions << k and then translate back into β via the full k dimensional augmented (γ, γ ) with γ (k n) 1 being independent of the data, whose distribution comes from the prior, and conditional distributions of γ γ are easy to calculate. Avoids inversion of k k matrices (very large) Min(n, k) non-zero solutions Typeset by FoilTEX 34
35 Perfect starts For n < k there will be many local boundary solutions. We explore the multiplicity of these solutions by generating alternative starting values all of which fit the data perfectly. The Minimum length least squares (MLLS)( ridge for small ridge constant) fit to the data for n < k is ˆβ MLLS = (X T X) + X T y where + denotes the Moore-Penrose generalised inverse. Using the Singular value decomposition, X = UDiag(d 1, d 2,..., d n )V T The orthogonal projection matrix is I P = I k V V T of rank k n. Typeset by FoilTEX 35
36 Consider generating a random k vector z and take w = (I V V T )z and this may be added to ˆβ MLLS to get another perfectly fitting starting point. This may induce a different posterior estimate which will be close to perfectly fitting but with another set of β s set to zero. We can thus explore the modes of the posterior seeing what β components come up regularly. Typeset by FoilTEX 36
37 Algorithms Easy to show that for hierarchical priors p β = E and for a Gaussian mixture prior { } log(f(β ψ) β β E ψ {ψ 1 } = 1 p β β giving a direct link between Newton-Raphson algorithms and EM and allow a direct extension to exponential family likelihoods. Typeset by FoilTEX 37
38 Kiiveri (2003) uses EM with a ridge line search in the M-step in his GENERAVE algorithm. Fan and Li (2001) use a Newton-Raphson style algorithm for their SCAD penalty and general likelihoods. LARS (Efron et al AS2003) and approximate GLM variants (Park and Hastie, 2006). Code that cycles through one dim searches for logistic likelihood(genkin et al, 2005; C. Hoggart, 2006) Hyperparameters (shape and scale) chosen by cross-validation with such fast algorithms Typeset by FoilTEX 38
39 Simulations NEG with shape λ, scale by the mean µ of underlying Gamma distribution n = 30 multiple regression with k = 1000 points; 5 random splits into two sets of 15 for training and testing. y i = f(x i ) + ɛ i i = 1,..., 15 where a spline is fitted f(x) = 1000 j=1 β j(x k j ) + k j = j ; ɛ i N(0, σ 2 ), with σ = 0.1, x i iid uniform (0, 1). Typeset by FoilTEX 39
40 Typeset by FoilTEX 40
41 Typeset by FoilTEX 41
42 Typeset by FoilTEX 42 Figure 2: fit to sine curve
43 Multiple regression simulation, σ 2 = 1, X X correlation form, AR(1) structure with lag 1 correlation ρ. n = 100 observations, k = 500 variables, 10 nonzero coefficients of β. Fitted by 5-fold cross validation and tested on 10 datasets of 100 obs. β = 1 β = 5 ρ = 0.5 ρ = 0.8 ρ = 0.5 ρ = 0.8 Ridge Lasso NJeffreys NEG (λ = 0.1) NEG Table 1: MSE prediction results (oracle 1.00) Typeset by FoilTEX 43
44 MAIN REFERENCES Griffin, JE and Brown, PJ (2005) U of Kent/Warwick Morris, JS, Brown PJ, Herrick, RC, Baggerly, KA, Coombes, KR (2006) Submitted to Biometrics: BePress preprint. Morris, JS, Carroll, RJ (2005, JRSSB) Morris, JS, Brown PJ, Baggerly, KA, Coombes, KR (2006) In Bayesian Inference for Gene Expression and Proteomics Ed K-A Do, P Mueller, M Vannucci, Fan, J and Li, R (2001) JASA, Kiiveri, H. (2003) IMS Monograph: Festschrift for T Speed, Park, MY and Hastie, T(2006) web Tech rpt, Stanford. Zou, H(2006) JASA, to appear. Typeset by FoilTEX 44
Bayesian Wavelet-Based Functional Mixed Models
Bayesian Wavelet-Based Functional Mixed Models Jeffrey S. Morris U M.D. Anderson Cancer Center Raymond J. Carroll exas A&M University Functional Data Analysis Functional Data: Ideal units of observation:
More informationBayesian Analysis of Mass Spectrometry Data Using Wavelet-Based Functional Mixed Models
Bayesian Analysis of Mass Spectrometry Data Using Wavelet-Based Functional Mixed Models Jeffrey S. Morris UT MD Anderson Cancer Center joint work with Philip J. Brown, Kevin R. Coombes, Keith A. Baggerly
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationWavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis
Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Jeffrey S. Morris University of Texas, MD Anderson Cancer Center Joint wor with Marina Vannucci, Philip J. Brown,
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationTechnical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models
Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Part 3: Functional Data & Wavelets Marina Vannucci Rice University, USA PASI-CIMAT 4/28-3/2 Marina Vannucci (Rice University,
More informationarxiv: v1 [stat.me] 6 Jul 2017
Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationNew Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER
New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationMultivariate Bayes Wavelet Shrinkage and Applications
Journal of Applied Statistics Vol. 32, No. 5, 529 542, July 2005 Multivariate Bayes Wavelet Shrinkage and Applications GABRIEL HUERTA Department of Mathematics and Statistics, University of New Mexico
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior
More informationSignal Denoising with Wavelets
Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More information28 Bayesian Mixture Models for Gene Expression and Protein Profiles
28 Bayesian Mixture Models for Gene Expression and Protein Profiles Michele Guindani, Kim-Anh Do, Peter Müller and Jeff Morris M.D. Anderson Cancer Center 1 Introduction We review the use of semi-parametric
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationIntegrated Anlaysis of Genomics Data
Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLearning with Sparsity Constraints
Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationInversion Base Height. Daggot Pressure Gradient Visibility (miles)
Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:
More informationLecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]
Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationRecursive Sparse Estimation using a Gaussian Sum Filter
Proceedings of the 17th World Congress The International Federation of Automatic Control Recursive Sparse Estimation using a Gaussian Sum Filter Lachlan Blackhall Michael Rotkowitz Research School of Information
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationSparse Bayesian Nonparametric Regression
François Caron caronfr@cs.ubc.ca Arnaud Doucet arnaud@cs.ubc.ca Departments of Computer Science and Statistics, University of British Columbia, Vancouver, Canada Abstract One of the most common problems
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More information