Regularized Regression A Bayesian point of view
|
|
- Rolf Thornton
- 5 years ago
- Views:
Transcription
1 Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM, Neurospin MM - 22 février /34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
2 Outline 1 Introduction to Regularization 2 Bayesian Regression 2/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
3 Example of Application in fmri Fig.: Example of application of regression (here SVR) in a fmri paradigm. For clouds of different numbers of dots, we predict the quantities seen by the subject. In collaboration with E. Eger - INSERM U562. 3/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
4 Introduction Regression Given a set of m images of n variables X = x 1,..., x n R n,m and a target y R m, we have the following regression model : y = Xw + ɛ with w the parameters of the model ( weights ), and ɛ the error. The estimation ŵ of w is commonly done by Ordinary Least Squares : ŵ = arg minŵ y ŵx 2 ŵ = (X T X) 1 X T y 4/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
5 Regularization Motivations OLS estimates often have low bias but large variance. All features have a weight Smaller subset with strong effects is more interpretable. We need some shrinkage (or regularization) to constraint ŵ. Constraint the values of ŵ to have few parameters which explain well the data. Bridge Regression - I.E. Frank and J.H. Friedman ŵ = arg minŵ y ŵx 2 + λ }{{} i ŵ i i, withi 0 }{{} Data fit Penalization 5/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
6 Ridge Regression L 2 norm - A.E. Hoerl and R.W. Kennard Ridge regression introduce a regularization with the L 2 norm : Properties ŵ = arg minŵ y ŵx 2 + λ 2 ŵ 2 2 Sacrifice a little of bias to reduce the variance of predicted values More stable. Grouping effect : voxels having closed signal will have closed coefficients Exhibits groups of correlated features. Keeps all the regressors in the model not easily interpretable model. 6/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
7 Ridge Regression - Simulation Fig.: Coefficients of Ridge Regression for different values of λ 2. 7/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
8 Lasso - Least Absolute Shrinkage and Selection Operator L 1 norm - R. Tibshirani Lasso regression introduce a regularization with the L 1 norm : Properties ŵ = arg minŵ y ŵx 2 + λ 1 ŵ 1 Parsimony : only a small subset of features with ŵ i 0 are selected increases the interpretability (especially conjugate with parcels). If n > m, i.e. more features than samples, Lasso will select at most m variables. More difficult to implement than Ridge Regression. 8/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
9 Lasso - Simulation Fig.: Coefficients of Lasso Regression for different values of ŵ 1 ŵ OLS 2. 9/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février /
10 Outline 1 Introduction to Regularization 2 Bayesian Regression 10/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
11 Bayesian Regression Questions What is a graphical model? Why using Bayesian probabilities? What is a prior? What is a Hierarchical Bayesian Model? Why using hyperparameters? 11/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
12 Bayesian Regression and Graphical model Model The regression model : y = Xw + ɛ can be see as the following model : p(y X, w, λ) = N (Xw, λ) with p(ɛ) = N (0, λ) λ ɛ N (0, λi n) ɛ X w y = Xw + ɛ y Can we add some (prior) knowledge on the parameters? 12/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
13 Bayesian probabilities Main idea - Introducing prior information in the modelization Baye s theorem : p(w y) = p(y w)p(w) p(y) where p(w) is called a prior. This prior, introduced before any access to the data, reflects our prior knowledge on w. Moreover, the posteriori probability p(w y) gives us the uncertainty in w after we have observed y. Example Unbiased coin tossed three times 3 heads. A frequentist approach : p(head) = 1 always head for all future tosses. A bayesian approach : we can add a prior (Beta), in order to represent our prior knowledge of unbiased coin gives less extrem results. see Bishop, Pattern Recognition and Machine Learning, for more details. 13/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
14 Priors for Bayesian Regression - Ridge Regression Prior for the weights w Penalization by weighted L 2 norm is equivalent to set a Gaussian prior on the weights w : w N (0, αi ) Bayesian Ridge Regression (C. Bishop 2000). Prior for the noise ɛ Classical prior (M.E. Tipping, 2000). ɛ N (0, λi n ) 14/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
15 Graphical model - Ridge Regression λ ɛ N (0, λi n ) α ɛ X w w N (0, αi ) y = Xw + ɛ y 15/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
16 Example on Simulated Data - Ridge Regression Log-densisity of the distribution of w for the Bayesian Ridge Regression. Can we find some more adaptive priors? 16/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
17 Priors for Bayesian Regression - ARD Prior for the weights w If we have different Gaussian priors for w : w N (0, A 1 ), A = diag(α 1,..., α m ) with α i α j if i j. Automatic Relevance Determination - ARD (MacKay 1994). Prior for the noise ɛ Classical prior (M.E. Tipping, 2000). ɛ N (0, λi n ) 17/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
18 Graphical model - ARD λ ɛ N (0, λi n ) α ɛ X w w N (0, A 1 ) y = Xw + ɛ A = diag(α 1,..., α m ) y 18/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
19 Example on Simulated Data - ARD Log-densisity of the distribution of w for ARD. 19/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
20 Other (less classical) Priors for Bayesian Regression LASSO - Laplacian Prior for the weights w Penalization by weighted L 1 norm is equivalent to set Laplacian priors on the weights w : w C exp λ w Elastic Net - Complex Prior for the weights w Penalization by weighted L 1 and L 2 norms is equivalent to set more complex prior on the weights w : w C(λ, α) exp λ w 1+α w 2 But... Difficult to estimate the parameters of the model... 20/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
21 Notions of Hyperpriors What is an hyperprior? The parameters of a prior are called hyperparameters. E.g. for ARD : α i is a hyperparameter. w N (0, A 1 ), A = diag(α 1,..., α m ) An hyperprior is a prior on the hyperparameters. Allow to adapt the parameters of the model to the data. Define a Hierarchical Bayesian Model. Classical hyperpriors See C. Bishop & M.E. Tipping, /34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
22 Bayesian Regression- Hierarchical Bayesian Model λ 1 λ 2 p(λ) = Γ(λ 1, λ 2 ) λ ɛ N (0, λi n ) γ i 1 γ i 2 p(α i ) = Γ(γ i 1, γi 2 ) α ɛ X y = Xw + ɛ y w w N (0, A 1 ) A = diag(α 1,..., α m ) But, how to learn the parameters? 22/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
23 THE difficult question : How to estimate our model? We have a model, we have some data How to learn the parameters? Likelihood The Log-Likelihood is the quantity p(y X, w, λ) viewed as a function of the parameter w. It expresses how probable the observed data is for different settings w. The main idea is to find the parameters which optimizes the Log-Likelihood L(y w), i.e. choosing the parameters for which the probability of the explained data y is maximized. Simple methods - When L(y w) is not too complicated Maximum A Posteriori - MAP Estimation Maximization - EM (Dempster, 1977). can be used in the presence of latent variables (e.g. mixture of Gaussians). 23/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
24 When L(y w) IS too complicated... Sampling methods When we cannot directly evaluate the distributions. Metropolis-Hatings, Gibb s Sampling, simulated annealing. Computationally expensive. 2d example of Gibb s sampling. 24/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
25 When L(y w) IS too complicated... Variational Bayes framework - Beal, M.J. (2003) Based on some (strongs) assumptions and approximations : Laplace approximation : Gaussian approximation of a probability density. Factorized some distributions. Is equivalent to the maximization of the Free Energy which is a lower bound on the log-likelihood. 25/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
26 Limits of the Variational Bayes framework Sometimes, the approximations of the VB framework are too strong for the model the free energy is no longer useful... 26/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
27 Results on Real data - Visual task Fig.: w for : Elastic net (left) - Bayesian Regression (Right) 27/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
28 Conclusion Bayesian Regression Allows to introduce prior information. Apriori distribution : uncertainty of the parameters after observing the data. Model estimation can be computationally expensive. Poor priors will lead to poor results with high confidence always validate the results by cross-validation! Thank you for your attention! 28/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
29 29/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
30 ARD - Practical point of view EM α new i = 1 + 2γ i 1 µ 2 i + Σ ii + 2γ i 2 with : λ new n + 2λ 1 = (y Xµ) 2 + Trace(ΣX T X) + 2λ 2 w = µ = λσx T y and Σ = (A + λx T X) 1 Gibb s Sampling p(w θ {w}) N (w µ, Σ) p(λ θ {λ}) Γ(l 1, l 2 ) p(α θ {α}) Π i=m i=1 Γ(α i g i 1, g i 2) with g i 1 = γi , g i 2 = γi X2 i, l 1 = λ 1 + n/2 and l 2 = λ (y Xw)2. 30/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
31 Estimation by Maximum Likelihood (ML) Likelihood The Log-Likelihood is the quantity p(y X, w, λ) viewed as a function of the parameter w. It expresses how probable the observed data is for different settings w. Estimation by ML The main idea is to find the parameter ŵ which optimizes the Log-Likelihood L(y w) : ŵ = arg maxŵl(y w) i.e. choosing ŵ for which the probability of the explained data y is maximized. The solution is (hopefully!) the same as the one found by the OLS : ŵ = (X T X) 1 X T y 31/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
32 How to construct the prediction Ŷ? Iterative Algorithm i=n Ŷ = x i ˆβ i = X ˆβ i=1 Start with Ŷ = 0, and at each step : Compute the vector of current correlations c(ŷ ) : ĉ = c(ŷ ) = X T (Y Ŷ ) Then, take the direction of the greatest current correlation for a small step ɛ : Ŷ Ŷ + ɛ sign (ĉ j) x j with j = arg max i ĉ i. 32/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
33 Choice of ɛ Forward Selection technique We have ɛ = ĉ j. If a feature seems interesting, we add it with β j = 1. Could we weight the influence of x j? Forward Stagewise Linear Regression We have ɛ 1. If a feature seems interesting, we add it with β j 1. Another feature can be selected at the next step if it is more correlated with Ŷ. We adapt the weights to the features. But the computation can be time-consuming. 33/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
34 LARS - Least Angle Regression Aim : accelerate the computation of the Stagewise procedure. Least Angle Regression We compute the optimal ɛ, i.e. the optimal step until another feature becomes more interesting. Take the equiangular direction between the selected features. Link with Lasso/Elastic net. 34/34 Vincent MICHEL (Inria - Parietal) Inverse Inference MM - 22 février
SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationMachine Learning for Biomedical Engineering. Enrico Grisan
Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationFirst Technical Course, European Centre for Soft Computing, Mieres, Spain. 4th July 2011
First Technical Course, European Centre for Soft Computing, Mieres, Spain. 4th July 2011 Linear Given probabilities p(a), p(b), and the joint probability p(a, B), we can write the conditional probabilities
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationA Bayesian Perspective on Residential Demand Response Using Smart Meter Data
A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationIEOR165 Discussion Week 5
IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationSparse Probabilistic Principal Component Analysis
Sparse Probabilistic Principal Component Analysis Yue Guan Electrical and Computer Engineering Dept. Northeastern University Boston, MA 05, USA Jennifer G. Dy Electrical and Computer Engineering Dept.
More informationLinear Regression. Volker Tresp 2018
Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationModel Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.
Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationAdaptive Sparseness Using Jeffreys Prior
Adaptive Sparseness Using Jeffreys Prior Mário A. T. Figueiredo Institute of Telecommunications and Department of Electrical and Computer Engineering. Instituto Superior Técnico 1049-001 Lisboa Portugal
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationBayesian Microwave Breast Imaging
Bayesian Microwave Breast Imaging Leila Gharsalli 1 Hacheme Ayasso 2 Bernard Duchêne 1 Ali Mohammad-Djafari 1 1 Laboratoire des Signaux et Systèmes (L2S) (UMR8506: CNRS-SUPELEC-Univ Paris-Sud) Plateau
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More information