L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
|
|
- Ella Chase
- 5 years ago
- Views:
Transcription
1 L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen
2 Outline George McCullochs model The Variational Garrote Bert Kappen L methods
3 Linear regression Given data x µ i, yµ, µ =,...,p find weights w i that best describe the relation y µ = n w i x µ i + ξµ i= The ordinary least square (OLS) minimizes OLS = ( y µ µ n w i x µ i i= ) 2 Solution is given by w = χ b χ ij = p x µ i xµ j b i = p x µ i yµ µ µ Problems: low accuracy due to overfitting (p < n) and interpretation: the OLS solution is not sparse Bert Kappen L methods 2
4 Ridge regression Add a regularization term Ridge = OLS + λ i w 2 i λ > w = (χ + λ) b Ridge regression - improves the prediction accuracy - maximal rank - solution is not sparse Bert Kappen L methods 3
5 Lasso Solve the OLS problem problem under the linear constraint i w i t. Equivalently, add a regularization term Lasso = OLS + λ i w i λ > There exist efficient methods to solve this quadratic programming problem. The solution tends to be sparse and improves both the prediction accuracy and the interpretability of the solution. Both ridge regression and Lasso are shrinkage methods that find a solution that is biased to smaller w. Bert Kappen L methods 4
6 Spike and slab Introduce a prior distribution over w i p(w i s i,β ± ) = ( s i )N(w i,σ spike ) + s i N(w i,σ slab ) s i =, p(s i γ) exp(γs i ) γ < p(w i ) = p(w i s i )p(s i γ) s i =, with /β = σ spike and /β + = σ slab. Likelihood p(y x, w, β ) = β 2π exp β 2 ( y n i= ) 2 w i x i p(d w, β ) = µ p(y µ x µ, w,β ) George and McCulloch in addition assume prior over β ±, β (and γ). Bert Kappen L methods 5
7 Posterior The posterior becomes p( w, s D,β, β ±,γ) = p(d w, β )p( w s,β ± )p( s γ) p(d) where D is the data and p(d w,β ) = p( w s,β ± ) ( P p(y µ x µ, w) exp β y µ 2 µ= µ i exp β p w i w j χ ij 2 w i b i 2 ij i ( exp n ( si β + wi 2 + ( s i )β w 2 ) ) i 2 w i x µ i ) 2 i= Bert Kappen L methods 6
8 p( s γ) = i exp(γs i ) 2 cosh(γ) For given s, the posterior distribution is Gaussian in w: p( w s) exp (w i wi )A ij ( s)(w j w 2 j) A ij ( s) = β pχ ij + (s i β + + ( s i )β )δ ij A ij wj = β pb i j ij For given w, the posterior factorizes in s: ( p( s w) = exp γ s i 2 i n ( si β + wi 2 + ( s i )β wi 2 ) ) i= Bert Kappen L methods 7
9 Gibbs sampling Sample w conditioned on s: w N( w ( s),a( s) ) Sample s independently: p(s i = ) = exp ( γ 2 β ) +wi 2 exp ( γ 2 β ( +wi) 2 + exp 2 β ) wi 2 Bert Kappen L methods 8
10 Spike and slab Advantage of spike and slab model is that it does not shrink w. However, MCMC is complex and time consuming. Bert Kappen L methods 9
11 The Garrote vil Bert Kappen L methods
12 The variational Garrote Introduce s i =, that select features. The regression model becomes y µ = n w i s i x µ i + ξµ s i =, i= To optimize the s i is equivalent to find the optimal subset of relevant features. Since the number of subsets is exponential in n one has to resort to heuristic methods to find a good subset of features. Here we propose a variational approximation. Bert Kappen L methods
13 The variational Garrote The likelihood term is given by p(y x, s, w, β) = β 2π exp β 2 ( y n i= ) 2 w i s i x i p(d s, w, β) = µ p(y µ x µ, s, w, β) = ( ) p/2 β exp βp 2π 2 n i,j= s i s j w i w j χ ij 2 n w i s i b i + σy 2 i= with b i, χ ij as before and σ 2 y = p µ (yµ ) 2. Bert Kappen L methods 2
14 The variational Garrote For concreteness, we assume that the prior over s p( s γ) = n i= p(s i γ) p(s i γ) = exp(γs i) + exp(γ) with γ given which specifies the sparsity of the solution. We further assume priors p(β, w). Bert Kappen L methods 3
15 The variational Garrote The posterior becomes p( s, w,β D,γ) = p( w, β)p( s γ)p(d s, w,β) p(d γ) Posterior is intractable: - MCMC - Variational Bayes - Variational MAP - BP, CVM,... Here we compute a variational MAP estimate. We approximate the marginal posterior p( w, β D,γ) = s p( s, w, β D,γ) and computing the MAP solution with respect to w,β. Bert Kappen L methods 4
16 Breiman s Garrote method The proposed model is similar to Breiman s Garrote method: y µ = n w i s i x µ i + ξµ s i =, i= which assumes s i instead of binary. It computes w i using OLS and then finds s i by minimizing ( y µ µ ) 2 n x µ i w is i subject to s i i= s i t i We refer to our method as the Binary Garrote (BG). Bert Kappen L methods 5
17 The variational approximation We compute the variational approximation using Jensens inequality: log s p( s γ)p(d s, w, β) s q( s) log q( s) p( s γ)p(d s, w,β) = F(q, w, β) The optimal q( s) is found by minimizing F(q, w,β) with respect to q( s). We consider the simplest case q( s) = n q i (s i ) q i (s i ) = m i s i + ( m i )( s i ) i= So q( s) is parametrized by m. Bert Kappen L methods 6
18 The variational approximation The expectation values with respect to q can now be easily evaluated and the result is F = p 2 log β 2π + βp 2 + γ n v i v j χ ij + i i,j n m i + n log( + exp(γ)) i= n (m i log m i + ( m i ) log( m i )) i= where we have defined v i = m i w i. m i m i v 2 i χ ii 2 n v i b i + σy 2 i= Bert Kappen L methods 7
19 The variational approximation The approximate posterior marginal posterior is then p( w, β D,γ) p( w, β) s p( s γ)p(d s, w, β) p( w, β) exp( F( m, w, β, γ)) = exp( G( m, w,β, γ)) G( m, w,β, γ) = F( m, w, β, γ) log p( w, β) We can compute the variational approximation m for given w, β, γ by minimizing F with respect to m. In addition, p( w, β D,γ) needs to be maximized with respect to w,β. Bert Kappen L methods 8
20 The variational approximation Taking the derivative of G with respect m, v, β and setting the derivatives equal to zero gives the following set of fixed point equations: m i ( = σ with σ(x) = ( + exp( x)) γ + βp 2 vi 2χ ) ii m 2 i v = (χ ) b χ ij = χ ij + m i χ ii δ ij m i n β = v i b i + σy 2 i= Bert Kappen L methods 9
21 Comments The variational approximation is not simply w i s i w i m i If this were the case, the substitution v i = w i m i would remove m i from the equations and the OLS problem would be recovered. The reason is s i s j = m i m j for i j, but s 2 i = si = m i. χ differs from χ by adding a positive diagonal to it, making χ automatically of maximal rank when m i <. Roughly speaking if χ has rank p < n, χ can be still of rank n when no more than p of the m i =, the remaining n p of the m i < making up for the rank deficiency. Bert Kappen L methods 2
22 When inputs are uncorrelated: χ ij = δ ij, Independent inputs w i = b i = x i y ( = σ γ + βp ) 2 b2 i m i /β = σ 2 y i b 2 im i i b2 i m i is the explained variance. The Garrote solution with m i has reduced explained variance with (hopefully) a better prediction accuracy and interpretability. Bert Kappen L methods 2
23 Univariate case In the -dimensional case these equations become m = σ β ( γ + p ) ρ 2 ρm = σ2 y( mρ) with ρ = b 2 /σ 2 y the squared correlation coefficient. = f(m) Bert Kappen L methods 22
24 Univariate case f(m) is an increasing function of m and crosses the line m either or three times, depending on the values of p,γ,ρ..8.8 f(m).6.4 f(m) m m f(m) vs m. Left: p =, γ =, different lines correspond to different values of < ρ <. Right: p =, γ = 3. solutions for m. The solutions close to m, correspond to local minima of F. The intermediate solution corresponds to a local maximum of F. Bert Kappen L methods 23
25 Univariate case One can compute the critical p for which multiple solutions occur. p = 4 ρ 2 ρ ( ρ + 2 ) p is a decreasing function of ρ. For p > p, we find two solutions for m. For p < p, we find one solution for m. γ 2 4 m.5.5 ρ 6 m ρ.5 ρ Left: Phase plot ρ, γ for p = Dotted line is solution for γ when m = /2. Right: m versus ρ for γ =, p = (top) and for γ = 4, p = (bottom). Bert Kappen L methods 24
26 Transfer function Suppose that data are generated from the model y = wx + ξ ξ 2 = x 2 = w estimated.5.5 vg ridge garrote lasso.5.5 w Binary Garrote (VG) with γ = and p =. Ridge regression with λ =.5. Garrote with γ = /4. Lasso with γ =.5. Bert Kappen L methods 25
27 Numerical examples Inputs are generated from a mean zero multi-variate Gaussian distribution with specified covariance structure. We generate outputs y µ = i ŵix µ i + ξµ with ξ µ N(, ˆσ). For each example, we generate a training set, a validation set and a test set (p/p v /p t ). For each value of the hyper parameters (γ in the case of BG, λ in the case of ridge regression and Lasso), we optimize the model parameters on the training set. We optimize the hyper parameters on the validation set. Bert Kappen L methods 26
28 x µ i N(, ) independently. ŵ = (,,...,), n = and ˆσ =. p/p v /p t = 5/5/4. Example F forward backward γ error γ train val v.5.5 v v 2:n γ 5 Binary Garrote Bert Kappen L methods 27
29 x µ i N(, ) independently. ŵ = (,,...,), n = and ˆσ =. p/p v /p t = 5/5/4. Example error 3 2 train val w.5 w w 2:n.5.5 λ.5 λ 5 error 2.5 train val w w w 2:n λ.2 5 λ 5 Lasso (top row) and ridge regression (bottom row) Bert Kappen L methods 28
30 x µ i N(, ) independently. ŵ = (,,...,), n = and ˆσ =. p/p v /p t = 5/5/4. Example Train Val Test # non-zero δw δw 2 Ridge.44 ± ±.3.79 ± ±.8.8 ±.5 Lasso.8 ±.22.6 ±.24.5 ± ± ±.39.6 ±.6 BG.83 ±.8.89 ±.9.2 ±.9.76 ± ±.26.4 ±.4 True.93 ±.4.87 ±.2.98 ±.4 Results on random instances. Bert Kappen L methods 29
31 Example 2 x µ N(, Σ) with Σ ij = δ i j, δ =.5. ŵ i =,i =,2, 5,, 5 and all other ŵ i =, n =, ˆσ =. p/p v /p t = 5/5/4. Train Val Test # non-zero δw δw 2 Lasso.78 ±.47.4 ±.3.49 ±.23.2 ± ± ±.22 BG.8 ±.2.8 ±.2.2 ± ±.7. ± ±.37 True. ±.8.97 ±.9.99 ±.7 5 Results on random instances lasso bg lasso bg Bert Kappen L methods 3
32 Dependence on noise Data as in example. test error Lasso VG gap.5.5 Lasso VG δ w Lasso VG σ σ 2 3 σ All results are averages over runs. Bert Kappen L methods 3
33 Implementation issues for high dimensional problems For large n, the most expensive part of the computation is inversion of χ Note, that the free energy can also be written as F = p 2 log β 2π + βp 2 + γ p p (z µ ) 2 + µ i n m i + n log( + exp(γ)) i= n (m i log m i + ( m i ) log( m i )) i= m i m i v 2 i χ ii 2 n v i b i + σy 2 with z µ = i xµ i v i. We can thus mimimize F with respect to v, z under linear constraints without the need to compute the covariance matrix χ. This is a quadratic optimization problem. i= Bert Kappen L methods 32
34 Implementation issues for high dimensional problems The quadratic program can be computed linear in time with n. n Regression QP. ±..4 ± ±.5. ±.3.6 ±..6 ± ±.3.33 ± ± ± CPU times in seconds for solving v by matrix inversion and for solving the QP problem using MOSEK. Problem is as described in Example with p = 5, ˆσ =. Lasso BG n Error δw 2 CPU (sec) Error δw 2 CPU (sec) / / / Bert Kappen L methods 33
35 Local minima: - appear for few and noisy data. - seem modest for (very) sparse problems. Discussion - increasing γ increases β and works as an annealing schedule. Extensions: - MAP: TAP, BP, CVM - Full Bayes : MCMC, VB,... - Use of priors (on γ) instead of cross validation Applications: - Finding structure of networks, both static and dynamic - Finding genes in GWAS -... arxiv.org/abs/9.486 Bert Kappen L methods 34
arxiv: v2 [stat.me] 8 Dec 2011
Abstract In this paper, I present a new solution method for sparse regression using L regularization. The model introduces a sparseness mechanism in the likelihood, instead of in the prior, as is done
More informationThe Variational Garrote
Mach Learn (2014) 96:269 294 DOI 10.1007/s10994-013-5427-7 The Variational Garrote Hilbert J. Kappen Vicenç Gómez Received: 7 January 2012 / Accepted: 18 November 2013 / Published online: 6 December 2013
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationRegularized Regression A Bayesian point of view
Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationBayesian construction of perceptrons to predict phenotypes from 584K SNP data.
Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationLinear Regression. Volker Tresp 2018
Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian learning of sparse factor loadings
Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008 Talk Outline Brief overview of popular sparsity priors Example application:
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationLogistic Regression with the Nonnegative Garrote
Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationThe joint posterior distribution of the unknown parameters and hidden variables, given the
DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationPREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou
PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLatent state estimation using control theory
Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationA new Hierarchical Bayes approach to ensemble-variational data assimilation
A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationReferences. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule
References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationHierarchical Modelling for Univariate Spatial Data
Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More information