Bayesian Regression of Piecewise Constant Functions

Size: px
Start display at page:

Download "Bayesian Regression of Piecewise Constant Functions"

Transcription

1 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Bayesian Regression of Piecewise Constant Functions Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland marcus@idsia.ch, marcus Valencia ISBA, 1 6 June 2006

2 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Table of Contents Bayesian Regression Quantities of Interest Efficient Solutions by Dynamic Programming Determination of the Hyper-Parameters Example: Gene Expression Data Extensions Summary

3 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Abstract I derive an exact and efficient Bayesian regression algorithm for piecewise constant functions of unknown segment number, boundary location, and levels. It works for any noise and segment level prior, e.g. Cauchy which can handle outliers. I derive simple but good estimates for the in-segment variance. I also propose a Bayesian regression curve as a better way of smoothing data without blurring boundaries. The Bayesian approach also allows straightforward determination of the evidence, break probabilities and error estimates, useful for model selection and significance and robustness studies. I present an application to microarray-cgh data analysis. Many possible extensions will be discussed. Keywords: Bayesian regression, exact polynomial algorithm, non-parametric inference, piecewise constant function, dynamic programming, application, microarray-cgh data.

4 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Advantages of Bayesian Regression Very principled, hence involves less heuristic design choices. Important for estimating the number of segments. One can decide among competing models solely on evidence. Bayes often works well in theory and practice. Probability estimates and variances for quantities of interest. Bayesian regression curve (better than local smoothing which wiggles more and blurs jumps).

5 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Setup / Likelihood True function f = (f 1,..., f n ) has k segments with boundaries 0 = t 0 < t 1 <... < t k 1 < t k = n, i.e. f is const. on {t q 1 +1,.., t q } for each 0 < q k. Noisy observations y = (y 1,..., y n ). Any independent noise with mean µ q and variance σ Data Break-Probability PCRegression STD(µ_t y) = Likelihood: P (y µ, σ) = k q=1 t q i=t q 1 +1 P (y i µ q, σ)

6 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Goal: Estimate segment levels µ = (µ 1,..., µ k ), boundaries t = (t 0,..., t k ), and their number k. Bayesian Regression Bayesian regression: Assume prior P (µ, t, k) Compute posterior: P (µ, t, k y) = Evidence: P (y) = k,t P (y µ, t, k)p (µ, t, k) P (y) P (y µ, t, k)p (µ, t, k) dµ Too complex: We need summaries like mean or MAP.

7 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Prior We model the level of each segment by a broad (e.g. Gaussian) distribution P (µ q ν, ρ) Uniform distribution among all segmentations into k segments: P (t k) = ( n 1 k 1 ) 1 Uniform prior over segment number k: P (k) = 1/n. = Prior: P (µ, t, k) = k q=1 P (µ q ν, ρ) P (t k) P (k) (ρ, ν, σ) are fixed hyper-parameters determined later.

8 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Quantities of Interest # Segments: ˆk = arg max k P (k y) Boundaries: ˆt q = arg max tq P (t q y, ˆk) Segment level: ˆµ q = E[µ q y, ˆt, ˆk] = P (µ q y, ˆt, ˆk)µ q dµ q The estimate (ˆµ,ˆt, ˆk) defines a (single) piecewise constant (PC) function ˆf, which is our estimate of f. A (very) different quantity is to Bayes-average over all piecewise constant functions and to ask for the mean at location i as an estimate for f i : Regression curve: ˆµ i = E[µ i y] = P (µ i y)µ i dµ i

9 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Dynamic Programming Dynamic programming: Fix a break, then data left and right of the break are independent. Evidence and moments of single segment from i + 1 to j. A r ij := P (µ m ) j t=i+1 P (y t µ m )µ r mdµ m Analytical for exponential family with conjugate prior like Gauss and numerically for others like Cauchy. L kj : P (y 1..y j k) of first j data, given k segments. R ki : P (y i+1..y n k) of last n i data, given k segments. Left recursion: Evidence of y 1..y j with k + 1 segments = evidence of y 0h with k segments single-segment evidence of y h+1..y j, summed over all locations h of boundary k: L k+1,j = j 1 h=k L kha 0 hj Similarly: Right recursion: R k+1,i = n k h=i+1 A0 ih R kh

10 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Efficient Solutions for Quantities of Interest Evidence P (y) = n k=1 P (y k)p (k) = 1 n The posterior of k and its MAP estimate are n k=1 L kn/( n 1 k 1 ) P (k y) = L kn ( n 1 k 1 )k maxe and ˆk = arg max k=1..k max P (k y) Prob. that boundary p located at h is P (t p =h y, ˆk) = L ph Rˆk p,h /Lˆkn MAP segment boundary p is ˆt p := arg max h P (t p = h y, ˆk) Segment level moments are µ r p = A rˆt /A 0ˆt p 1ˆt p p 1ˆt p Regression curve: Fix single segment t m 1 = i,.., t m = j containing t, then µ t = µ m. Now sum over all such segments: µ t r = 1 ˆk i<t j Lˆkn m=1 L m 1,i A r ijrˆk m,j

11 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Determination of the Hyper-Parameters Global variance ρ and mean ν of µ, in-segment variance σ. (Empirical) Bayes: Averaging or maximizing P (y σ, ν, ρ) is expensive. Fast semi-principled estimation Global mean ˆν 1 n n t=1 y t Global variance ˆρ 2 1 n n 1 t=1 (y t ˆν) 2 In-segment variance σ more tricky without knowing segmentation: ˆσ 2 1 n 1 2(n 1) t=1 (y t+1 y t ) 2 Good for large noise. Crude estimate is enough if noise is low (regression easy).

12 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Quartiles for Heavy-Tailed Robust Distributions Let [y] be the data vector y sorted in ascending order. Global median ˆν [y] n/2 Global scale ˆρ [y] 3n/4 [y] n/4 2α Differences t := y t+1 y t. with α 1 In-segment scale ˆσ [ ] 3n/4 [ ] n/4 2β with β 12 Iteratively improve them, if the estimates are really not sufficient.

13 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Example: Gene Copy # Data All genes in a healthy human cell come in pairs, but can be lost or multiplied in tumor cells. With modern micro-arrays one can measure the copy-number of genes along a chromosome. It is important to determine the breaks, where copy-number chances. The measurements are very noisy [Pinkel 98]. Hence this is a natural application for piecewise constant regression of noisy (one-dimensional) data. Regression results of one aberrant and one healthy chromosome (without biological interpretation) are shown...

14 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Aberrant Gene Copy # of Chromosome Data Break-Probability PCRegression STD(µ_t y) Data (blue), PCR (black), BP (red), and variance 1/2 (green).

15 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Aberrant Gene Copy # of Chromosome Data BayesMean Data with Bayesian regression curve ± 1 std.-deviation.

16 Marcus Hutter Bayesian Regression of Piecewise Constant Functions log(evidence) Aberrant Gene Copy # of Chromosome log(evidence) ML#segments Estimate 100 sqrt(varseg) log P (y) (blue) and ˆk (green) as function of σ and our estimate ˆσ of (arg) max σ P (y) and ˆk(ˆσ) (black triangles) ML#segments

17 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Normal Gene Copy # of Chromosome Data with Bayesian regression curve.

18 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Posterior Segment Number Probability P (k y) log P(k y) GM k CH CMwG Gen(5,9) Gen(3,1) For medium Gaussian noise (GM, black), high Cauchy noise (CH, blue), medium Cauchy noise with Gaussian regression (CMwG, green), aberrant gene expression of chromosome 1 (Gen(3,1), red), normal gene expression of chromosome 9 (Gen(5,9), pink).

19 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Synthetic Example: What do you see?

20 Marcus Hutter Bayesian Regression of Piecewise Constant Functions 1.5 Synthetic Example: PC-Regression Data Break-Probability PCRegression Var(mu_t y) Data was indeed sampled from a three segment function with high Cauchy noise. Data (blue), PCR (black), BP (red), and Var (green).

21 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Synthetic Example: Bayesian Regression Curve Data with Bayesian regression curve ± 1 std.-deviation.

22 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Regression Summary of Gene and other Examples ( Setup) Gauss, Cauchy, Low, Medium, High noise, Gene true noise scale data size method global mean estimate global deviation estimate in-segment deviation est. log-evidence log P (y) rel. log-likelihood ll E[ll ˆf] Var[ll ˆf] 1/2 Opt.#segm. Confidence P (ˆk( 1, +1) y) Name σ n P ˆν ˆρ ˆσ log E ll E σ ll ˆk Ck( 1,+1) GL G %(0 20) GM G %(0 29) GH G %(10 12) CL C %(0 21) CM C %(0 27) CH C %(11 11) GMwC C %(0 26) CMwG G %(8 8) Gen G %(6 6) Gen G %(0 6)

23 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Extensions Any generalized one-segment evidence (no problem) Known segment levels (even easier) (Non)constant regressors (easy) Piecewise linear regression (easy) Continuous regression (harder, approximate) Non-parametric prior and noise (easy) Very large n (break into overlapping pieces, heuristic)

24 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Related work [Sen&Srivastava 75] Frequentist solution for detecting a single break. [Olshen&al 04] Generalization to pair of breaks. Heuristic recursion for further remaining breaks. [Jon 03,Lavielle 05] Penalized Maximum Likelihood. [Endres&Földiák 05] Piecewise constant (PC) Bayesian density estimation. [Lav 05,EF 05] Dynamic programming.

25 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Summary Full Bayesian PC-regression (Non)Gaussian noise and prior Handling of outliers Analytic estimate for in-segment variance Bayesian regression curve Break probabilities and variances Global evidence for model comparison Principled, little parameters to choose (important for det. of k).

26 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Thanks! Questions? Details: Papers at marcus Book intends to excite a broader AI audience about abstract Algorithmic Information Theory and inform theorists about exciting applications to AI. Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Bayes + Turing = = A Unified View of Artificial Intelligence

Exact Bayesian Regression of Piecewise Constant Functions

Exact Bayesian Regression of Piecewise Constant Functions Exact Bayesian Regression of Piecewise Constant Functions Marcus Hutter RSISE @ ANU and SML @ NICTA Canberra, ACT, 2, Australia marcus@hutter.net 4 May 27 Abstract www.hutter.net We derive an exact and

More information

Exact Bayesian Regression of Piecewise Constant Functions

Exact Bayesian Regression of Piecewise Constant Functions Bayesian Analysis (2007) 2, Number 4, pp. 635 664 Exact Bayesian Regression of Piecewise Constant Functions Marcus Hutter Abstract. 1 We derive an exact and efficient Bayesian regression algorithm for

More information

Fast Non-Parametric Bayesian Inference on Infinite Trees

Fast Non-Parametric Bayesian Inference on Infinite Trees Marcus Hutter - 1 - Fast Bayesian Inference on Trees Fast Non-Parametric Bayesian Inference on Infinite Trees Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2,

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

Universal Convergence of Semimeasures on Individual Random Sequences

Universal Convergence of Semimeasures on Individual Random Sequences Marcus Hutter - 1 - Convergence of Semimeasures on Individual Sequences Universal Convergence of Semimeasures on Individual Random Sequences Marcus Hutter IDSIA, Galleria 2 CH-6928 Manno-Lugano Switzerland

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

On the Foundations of Universal Sequence Prediction

On the Foundations of Universal Sequence Prediction Marcus Hutter - 1 - Universal Sequence Prediction On the Foundations of Universal Sequence Prediction Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions

Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Marcus Hutter - 1 - Universal Artificial Intelligence Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Marcus Hutter Istituto Dalle Molle

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Theoretically Optimal Program Induction and Universal Artificial Intelligence. Marcus Hutter

Theoretically Optimal Program Induction and Universal Artificial Intelligence. Marcus Hutter Marcus Hutter - 1 - Optimal Program Induction & Universal AI Theoretically Optimal Program Induction and Universal Artificial Intelligence Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

A Bayesian Criterion for Clustering Stability

A Bayesian Criterion for Clustering Stability A Bayesian Criterion for Clustering Stability B. Clarke 1 1 Dept of Medicine, CCS, DEPH University of Miami Joint with H. Koepke, Stat. Dept., U Washington 26 June 2012 ISBA Kyoto Outline 1 Assessing Stability

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Nonlinear Filtering. With Polynomial Chaos. Raktim Bhattacharya. Aerospace Engineering, Texas A&M University uq.tamu.edu

Nonlinear Filtering. With Polynomial Chaos. Raktim Bhattacharya. Aerospace Engineering, Texas A&M University uq.tamu.edu Nonlinear Filtering With Polynomial Chaos Raktim Bhattacharya Aerospace Engineering, Texas A&M University uq.tamu.edu Nonlinear Filtering with PC Problem Setup. Dynamics: ẋ = f(x, ) Sensor Model: ỹ = h(x)

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

L-momenty s rušivou regresí

L-momenty s rušivou regresí L-momenty s rušivou regresí Jan Picek, Martin Schindler e-mail: jan.picek@tul.cz TECHNICKÁ UNIVERZITA V LIBERCI ROBUST 2016 J. Picek, M. Schindler, TUL L-momenty s rušivou regresí 1/26 Motivation 1 Development

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences

Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences Technical Report IDSIA-07-01, 26. February 2001 Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences Marcus Hutter IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland marcus@idsia.ch

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Applying Bayesian Estimation to Noisy Simulation Optimization

Applying Bayesian Estimation to Noisy Simulation Optimization Applying Bayesian Estimation to Noisy Simulation Optimization Geng Deng Michael C. Ferris University of Wisconsin-Madison INFORMS Annual Meeting Pittsburgh 2006 Simulation-based optimization problem Computer

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Non-Gaussian likelihoods for Gaussian Processes

Non-Gaussian likelihoods for Gaussian Processes Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Alessandro Antonucci Marco Zaffalon IDSIA, Istituto Dalle Molle di Studi sull

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

A Brief Introduction to the Matlab package StepSignalMargiLike

A Brief Introduction to the Matlab package StepSignalMargiLike A Brief Introduction to the Matlab package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the Matlab package for estimating change-points in stepwise signals.

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our

More information

A Brief Introduction to the R package StepSignalMargiLike

A Brief Introduction to the R package StepSignalMargiLike A Brief Introduction to the R package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the R package StepSignalMargiLike for estimating change-points in stepwise

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information