Bayesian methods in economics and finance
|
|
- Rafe Parker
- 6 years ago
- Views:
Transcription
1 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors
2 Linear Regression 2/26 Linear regression Model for relationship between (several) independent variables x = (x 1,..., x D 1 ) and dependent variable y D 1 y = w 0 + w i x i + ɛ i=1 Structure: Linear relationship with parameters w Noise: Additive observation noise ɛ N (0, σ ɛ )
3 Linear Regression 2/26 Linear regression Model for relationship between (several) independent variables x = (x 1,..., x D 1 ) and dependent variable y D 1 y = w 0 + w i x i + ɛ i=1 Structure: Linear relationship with parameters w Noise: Additive observation noise ɛ N (0, σ ɛ ) Convenient matrix notation where w, x R D and x 0 1 y = w T x + ɛ
4 Linear regression Example data {(x n, t n )} N n=1 Collect data in matrices X = (x T 1,..., x T N )T R N D and t = (t 1,..., t N ) T R N 1 X is also called design matrix Likelihood p(data θ) p(t X, θ) = N N (t n w T x n, σ ɛ ) n=1 = (2πσ ɛ ) N 2 e 1 2 N (tn w T xn) 2 n=1 σɛ 2 with parameters θ = (w, σ ɛ ) weights w = (w0,..., w D ) T noise variance σ 2 ɛ Note: Data are modeled as exchangeable Linear Regression 3/26
5 Ordinary least squares Maximum likelihood solution: max ln p(t X, w, σ ɛ ) = max N w,σ ɛ w,σ 2 ln σ2 ɛ 1 2σɛ 2 Find optimal weight vector w : N (t n w T x n ) 2 n=1 ln p(t X, w, σ ɛ ) = 1 N w i σɛ 2 (t n w T! x n )x in = 0 n=1 ( N ) N = w T x n x T n = t n x T n n=1 i=1 w = (X T X) 1 X T t Minimizes the squared error 1 N 2 n=1 (t n w T x n ) 2 X = (X T X) 1 X T is pseudo-inverse of design matrix X Linear Regression 4/26
6 Linear Regression 4/26 Ordinary least squares Maximum likelihood solution: max ln p(t X, w, σ ɛ ) = max N w,σ ɛ w,σ 2 ln σ2 ɛ 1 2σɛ 2 N (t n w T x n ) 2 n=1 Find optimal weight vector w : w = (X T X) 1 X T t Find optimal noise precision τ ɛ : ln p(t X, w, τ ɛ ) = N 1 1 τ ɛ 2 τ ɛ 2 = 1 τ ɛ = 1 N N (t n w T x n ) 2! = 0 n=1 N (t n w T x n ) 2 n=1
7 Linear Regression 5/26 Bayesian linear regression Assume that noise precision τ ɛ is known Conjugate prior for weights w is Gaussian Posterior is again Gaussian with covariance matrix p(w 0, Σ 0 ) Σ N = (Σ τ ɛ X T X) 1 mean µ N = τ ɛ Σ N X T t
8 Linear Regression 6/26 Bayesian linear regression Posterior found by completing the square: p(w X, t, τ ɛ ) e 1 2 τɛ N n=1 (tn wt x n) 2 e 1 2 wt Σ 1 0 w Show demo... = e 1 2 (τɛ(xw t)t (Xw t)+w T Σ 1 0 w) e 1 2 wt Σ 1 N {}}{ (τ ɛ X T X + Σ 1 0 ) w+wt Σ 1 N µ N {}}{ τ ɛ X T t
9 inear Regression 7/26 Posterior intervals Consider the posterior p(θ D) of a single parameter θ: To quantify the uncertainty, define an interval [a, b] which contains 95% of the posterior probability: P(a < θ < b D) = 95% Common choices are Symmetric intervals, i.e. [ˆθ x, ˆθ + x] around the posterior mean or mode ˆθ Intervals with p(θ = a D) = p(θ = b D) Intervals with a, b being fixed quantiles Also called credible intervals, Bayesian confidence intervals
10 inear Regression 8/26 Posterior intervals Example: Estimating mean of a Gaussian N (µ, σ) with uninformative prior: P(ˆµ 1.96 σ N < µ < ˆµ 1.96 σ N ) = 95% Suggestive notation as interpretation is very different from confidence interval Confidence interval: Parameter µ is fixed and random interval covers µ with probability 95% Coverage is a repeated sampling concept Credible interval: Data, i.e. ˆµ, are fixed and uncertain parameter lies in interval with probability 95%
11 inear Regression 9/26 Posterior intervals Frequentist interval I (D) with α coverage: P(θ I (D) θ) = α θ With Bayesian prior p(θ) it holds: α = P(θ I (D) θ)p(θ)dθ = 1 θ I (D) p(d, θ)pdpθ = P(θ I (D) D)p(D)pD Thus, Bayesian intervals Have good coverage on avergage But without worst-case guaranties, i.e. coverage for surprising data can be low
12 inear Regression 10/26 Predictive Distribution Predict new data point at x new Predictive distribution: p(t new x new ) = Again Gaussian distribution with mean µ T N x new variance σ 2 ɛ + x T new Σ N x new p(t new x new, w)p(w X, t)dw Noise variance + uncertainty about parameters Do not pick best parameters, but take uncertainty into account = Bayesian slogan: Don t optimize, integrate!
13 Linear Regression 11/26 Regularization Compare ML solution w = (X T X) 1 X T t Minimizes 1 2 N n=1 (t n w T x n ) 2 and posterior maximum (Σ 1 0 = τ 0 I) µ N = τ ɛ (τ 0 I + τ ɛ X T X) 1 X T t Minimizes regularized mean squared error 1 2 N (t n w T x n ) 2 + λ 2 wt w n=1 where λ = τ0 τ ɛ squared error + regularizer
14 Linear Regression 11/26 Regularization Compare ML solution w = (X T X) 1 X T t Minimizes 1 N 2 n=1 (t n w T x n ) 2 and posterior maximum (Σ 1 0 = τ 0 I) µ N = τ ɛ (τ 0 I + τ ɛ X T X) 1 X T t Quadratic regularizer controls model complexity by shrinking weights to zero (reduces variance) Lasso λ Asolute regularizer: D 2 i=1 w i Favors sparse solutions, i.e. some weights exactly zero Corresponds to Laplace prior: p(w) e i w i
15 Slide from Linear Regression 12/26 Regularization Regularized Least Squares (3) Lasso tends to generate sparser solutions than a quadratic regularizer.
16 Bias-variance decomposition Consider expected mean squared error: E[(y(x; w) t) 2 ] = (y(x; w) t) 2 p(x, t)dxdt = E[(y(x; w) h(x)) 2 ] + E[(h(x) t) 2 ] + 2 E[(w T x h(x))(h(x) t)] }{{} =0 since h(x)=e[t x] Regression function y(x; w) = w T x Optimal choice is h(x) = E[t x] In practice ŵ is estimated from data set D E D [(y(x; ŵ) h(x)) 2 ] = (E D [y(x; ŵ)] h(x)) 2 }{{} bias 2 + E D [(y(x; ŵ) E D [y(x; ŵ)]) 2 ] }{{} variance + 2 E D [(y(x; ŵ) E D [y(x; ŵ)])(e D [y(x; ŵ)] h(x))] } {{ } =0 inear Regression 13/26
17 Slide from Linear Regression 14/26 Bias-variance decomposition The Bias-Variance Decomposition (5) Example: 25 data sets from the sinusoidal, varying the degree of regularization,.
18 Slide from Linear Regression 14/26 Bias-variance decomposition The Bias-Variance Decomposition (6) Example: 25 data sets from the sinusoidal, varying the degree of regularization,.
19 Slide from Linear Regression 14/26 Bias-variance decomposition The Bias-Variance Decomposition (7) Example: 25 data sets from the sinusoidal, varying the degree of regularization,.
20 Slide from Linear Regression 14/26 Bias-variance decomposition The Bias-Variance Trade-off From these plots, we note that an over-regularized model (large ) will have a high bias, while an underregularized model (small ) will have a high variance.
21 odel selection 15/26 Bayesian model selection Maximum likelihood max θ p(t X, θ) improves when number of parameters increases. This leads to over-fitting as model picks up noise in data. To achieve low expected loss we need to control bias }{{} 2 + variance }{{} decreases with model complexity increases with model complexity Regularization favors less flexible models by penalizing large weights Marginal likelihood/evidence: p(t X) = p(t X, w)p(w)dw Bayesian answer to over-fitting Probability of data accounting for parameter uncertainty
22 Model selection 16/26 Bayesian model selection Consider set of models {M i }, e.g. regression models with different number/subset of covariates: Prior probability of each model p(m i ) Compute posterior: p(m i D) p(d M i )p(m i ) Note that each model has parameters w i and thus p(d M i ) = p(d, w i M i )dw i = p(d w, M i )p(w i M i )dw i Marginal likelihood (wrt parameters) is likelihood (wrt model)! Note: Marginal likelihood requires proper prior
23 Model selection 17/26 Bayesian model selection Two models M 1, M 2 are then compared based on posterior odds p(m 1 D) p(m 2 D) = p(d M 1) p(m 1 ) p(d M 2 ) p(m }{{} 2 ) }{{} Bayes factor Prior odds Interpretation of evidence from Bayes factors as suggested by Jeffreys: Bayes factor decibans Evidence < 1 < 0 negative (supports M 2 ) barely worth mentioning substantial strong very strong > 100 > 20 decisive
24 Model selection 18/26 Bayesian model selection Marginal likelihood can be computed analytically: Prior: p(w) = N (w 0, τ 0 I) Likelihood: p(t X, w) = N (t Xw, τ ɛ I) Evidence: ln p(t X) = ln p(t X, w )p(w) p(w for any w X, t) D }{{} = 2 ln τ 0 + N 2 ln τ ɛ N ln 2π 2 w =µ N τ ɛ 2 (Xµ N t) T (Xµ N t) τ 0 2 µt N µ N 1 ln A 2 where µ N = τ ɛ A 1 X T t A = τ 0 I + τ ɛ X T X = Σ 1 N
25 Slide from Model selection 19/26 Bayesian model selection 0 th Order Polynomial
26 Slide from Model selection 19/26 Bayesian model selection 1 st Order Polynomial
27 Slide from Model selection 19/26 Bayesian model selection 3 rd Order Polynomial
28 Slide from Model selection 19/26 Bayesian model selection 9 th Order Polynomial
29 Slide from Model selection 19/26 Bayesian model selection The Evidence Approximation (3) Example: sinusoidal data, M th degree polynomial,
30 Model selection 20/26 Zellner s g-prior Empirical prior with particularly easy to compute Bayes factors: Jeffreys prior p(σ 2 ɛ ) 1 σ 2 ɛ Empirial g-prior for regression weights: w N (w 0, gσ 2 ɛ (X T X) 1 ) Bayes factor between model using subset γ of regressors and null model (intercept only): p(d M γ ) p(d M 0 ) = (1 + g)n γ 1 2 [1 + g(1 Rγ)] 2 n 1 2 where R 2 γ is the coefficient of determination Note: p(d Mγ ) p(d M γ) p(d M γ ) = p(d M 0 ) p(d M γ ) p(d M 0 )
31 Zellner s g-prior How to choose g? Bartlett paradoxon: General problem of uninformative prior compared to point hypothesis p(d M γ ) p(d M 0 ) g 0 Information paradoxon: Consider overwhelming support for M γ, e.g. Rγ 2 1. Bayes factor stays bounded as p(d Mγ) p(d M 0 ) (1 + g)n γ 1 2 Need to increase g with number N of data points: Empirical Bayes choosing g ML II = argmax g p(d M γ ) Hyperprior for g: Zellner-Slow prior: g Inv Gamma( 1 2, N 2 ) Hyper g-prior: g 1+g Beta(1, α 2 1) All choices are asymptotically consistent for model selection and prediction odel selection 21/26
32 Model selection 22/26 Evidence approximation How to choose hyperparameters, e.g. τ 0? Bayesian ideal: Assume prior p(τ 0 ) and integrate Often analytically intractable Introduces new hyperparameters... have to stop at some point Evidence approximation: Optimize marginal likelihood p(d τ 0 ), i.e. ML II Tends to work well in practice with little over-fitting Model selection from {Mτ0 } τ0>0
33 Model selection 23/26 Sparsity priors Alternative to model selection: Fit large model with many parameters, but use prior that favors sparse models E.g. many regression weights are very small: Hierarchical specification, i.e. w N (0, σ σ σd 2 ) and 1 σ 2 i Gamma(α, β) Evidence approximation for hyperparameters σ 1,..., σ D is called Automatic Relevance Determination (ARD) Laplace prior p(w i ) e w i Bayesian LASSO Spike and Slap prior: { w i 0 with prob. q N (0, σ w ) with prob. 1 q q Uniform[0, 1]
34 Sparsity priors Modern sparsity prior: Horseshoe prior w i N (0, σ i σ) σ i C + (0, 1) with the standard half-cauchy distribution C + (0, 1) Figure from: C. M. Carvalho et al., Handling Sparsity via the Horseshoe, Proceedings of the 12 th International Conference on Artificial Intelligence and Statistics, odel selection 24/26
35 Sparsity priors Modern sparsity prior: Horseshoe prior w i N (0, σ i σ) σ i C + (0, 1) Compared to LASSO: Small weights: Stronger shrinkage Large weights: Smaller bias Figure from: C. M. Carvalho et al., Handling Sparsity via the Horseshoe, Proceedings of the 12 th International Conference on Artificial Intelligence and Statistics, odel selection 24/26
36 Model selection 25/26 Cross-validation Idea: Select model based on generalization error, i.e. best predictions on novel data Cross-validation estimates generalization error: Partition data set D into K parts D 1,..., D K For each i = 1,..., K 1. Train model on D \ D i 2. Evaluate model on D i, e.g. using predictive distribution Predictive performance is estimated by average prediction error across D 1,..., D K Common choices in practice K = 10: Good compromise between bias (due to small training set) and variance (due to small test sets) K = N: Leaving-one-out cross-validation LOOCV Often efficient if data point can easily be removed from trained model
37 Model selection 26/26 Model selection Model selection summary: Evidence Cross-validation Philosophy Bayesian Frequentist Prior p(θ) matters Prior insensitive Model all data p(d) Predict part of data p(d i D \ D i ) Often computationally demanding Evaluation Probability Different error functions Asymptotics Consistent Often inconsistent, e.g. LOOCV BIC = 2 log p(d θ ML ) + D log N AIC = 2 log p(d θ ML ) + 2D Interpretation Data compression Prediction
STA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationRegularized Regression A Bayesian point of view
Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationBias-Variance Trade-off in ML. Sargur Srihari
Bias-Variance Trade-off in ML Sargur srihari@cedar.buffalo.edu 1 Bias-Variance Decomposition 1. Model Complexity in Linear Regression 2. Point estimate Bias-Variance in Statistics 3. Bias-Variance in Regression
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationBayesian Inference: Concept and Practice
Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationModel Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015
Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model
More informationBayesian Model Averaging
Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationSCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati
SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationIntroduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters
Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationThe Laplace Approximation
The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationMachine Learning Srihari. Probability Theory. Sargur N. Srihari
Probability Theory Sargur N. Srihari srihari@cedar.buffalo.edu 1 Probability Theory with Several Variables Key concept is dealing with uncertainty Due to noise and finite data sets Framework for quantification
More informationCross-sectional space-time modeling using ARNN(p, n) processes
Cross-sectional space-time modeling using ARNN(p, n) processes W. Polasek K. Kakamu September, 006 Abstract We suggest a new class of cross-sectional space-time models based on local AR models and nearest
More informationEstimation. Max Welling. California Institute of Technology Pasadena, CA
Preliminaries Estimation Max Welling California Institute of Technology 36-93 Pasadena, CA 925 welling@vision.caltech.edu Let x denote a random variable and p(x) its probability density. x may be multidimensional
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More information