Bayesian Regression of Piecewise Constant Functions
|
|
- Leslie Quinn
- 5 years ago
- Views:
Transcription
1 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Bayesian Regression of Piecewise Constant Functions Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland marcus@idsia.ch, marcus Valencia ISBA, 1 6 June 2006
2 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Table of Contents Bayesian Regression Quantities of Interest Efficient Solutions by Dynamic Programming Determination of the Hyper-Parameters Example: Gene Expression Data Extensions Summary
3 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Abstract I derive an exact and efficient Bayesian regression algorithm for piecewise constant functions of unknown segment number, boundary location, and levels. It works for any noise and segment level prior, e.g. Cauchy which can handle outliers. I derive simple but good estimates for the in-segment variance. I also propose a Bayesian regression curve as a better way of smoothing data without blurring boundaries. The Bayesian approach also allows straightforward determination of the evidence, break probabilities and error estimates, useful for model selection and significance and robustness studies. I present an application to microarray-cgh data analysis. Many possible extensions will be discussed. Keywords: Bayesian regression, exact polynomial algorithm, non-parametric inference, piecewise constant function, dynamic programming, application, microarray-cgh data.
4 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Advantages of Bayesian Regression Very principled, hence involves less heuristic design choices. Important for estimating the number of segments. One can decide among competing models solely on evidence. Bayes often works well in theory and practice. Probability estimates and variances for quantities of interest. Bayesian regression curve (better than local smoothing which wiggles more and blurs jumps).
5 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Setup / Likelihood True function f = (f 1,..., f n ) has k segments with boundaries 0 = t 0 < t 1 <... < t k 1 < t k = n, i.e. f is const. on {t q 1 +1,.., t q } for each 0 < q k. Noisy observations y = (y 1,..., y n ). Any independent noise with mean µ q and variance σ Data Break-Probability PCRegression STD(µ_t y) = Likelihood: P (y µ, σ) = k q=1 t q i=t q 1 +1 P (y i µ q, σ)
6 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Goal: Estimate segment levels µ = (µ 1,..., µ k ), boundaries t = (t 0,..., t k ), and their number k. Bayesian Regression Bayesian regression: Assume prior P (µ, t, k) Compute posterior: P (µ, t, k y) = Evidence: P (y) = k,t P (y µ, t, k)p (µ, t, k) P (y) P (y µ, t, k)p (µ, t, k) dµ Too complex: We need summaries like mean or MAP.
7 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Prior We model the level of each segment by a broad (e.g. Gaussian) distribution P (µ q ν, ρ) Uniform distribution among all segmentations into k segments: P (t k) = ( n 1 k 1 ) 1 Uniform prior over segment number k: P (k) = 1/n. = Prior: P (µ, t, k) = k q=1 P (µ q ν, ρ) P (t k) P (k) (ρ, ν, σ) are fixed hyper-parameters determined later.
8 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Quantities of Interest # Segments: ˆk = arg max k P (k y) Boundaries: ˆt q = arg max tq P (t q y, ˆk) Segment level: ˆµ q = E[µ q y, ˆt, ˆk] = P (µ q y, ˆt, ˆk)µ q dµ q The estimate (ˆµ,ˆt, ˆk) defines a (single) piecewise constant (PC) function ˆf, which is our estimate of f. A (very) different quantity is to Bayes-average over all piecewise constant functions and to ask for the mean at location i as an estimate for f i : Regression curve: ˆµ i = E[µ i y] = P (µ i y)µ i dµ i
9 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Dynamic Programming Dynamic programming: Fix a break, then data left and right of the break are independent. Evidence and moments of single segment from i + 1 to j. A r ij := P (µ m ) j t=i+1 P (y t µ m )µ r mdµ m Analytical for exponential family with conjugate prior like Gauss and numerically for others like Cauchy. L kj : P (y 1..y j k) of first j data, given k segments. R ki : P (y i+1..y n k) of last n i data, given k segments. Left recursion: Evidence of y 1..y j with k + 1 segments = evidence of y 0h with k segments single-segment evidence of y h+1..y j, summed over all locations h of boundary k: L k+1,j = j 1 h=k L kha 0 hj Similarly: Right recursion: R k+1,i = n k h=i+1 A0 ih R kh
10 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Efficient Solutions for Quantities of Interest Evidence P (y) = n k=1 P (y k)p (k) = 1 n The posterior of k and its MAP estimate are n k=1 L kn/( n 1 k 1 ) P (k y) = L kn ( n 1 k 1 )k maxe and ˆk = arg max k=1..k max P (k y) Prob. that boundary p located at h is P (t p =h y, ˆk) = L ph Rˆk p,h /Lˆkn MAP segment boundary p is ˆt p := arg max h P (t p = h y, ˆk) Segment level moments are µ r p = A rˆt /A 0ˆt p 1ˆt p p 1ˆt p Regression curve: Fix single segment t m 1 = i,.., t m = j containing t, then µ t = µ m. Now sum over all such segments: µ t r = 1 ˆk i<t j Lˆkn m=1 L m 1,i A r ijrˆk m,j
11 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Determination of the Hyper-Parameters Global variance ρ and mean ν of µ, in-segment variance σ. (Empirical) Bayes: Averaging or maximizing P (y σ, ν, ρ) is expensive. Fast semi-principled estimation Global mean ˆν 1 n n t=1 y t Global variance ˆρ 2 1 n n 1 t=1 (y t ˆν) 2 In-segment variance σ more tricky without knowing segmentation: ˆσ 2 1 n 1 2(n 1) t=1 (y t+1 y t ) 2 Good for large noise. Crude estimate is enough if noise is low (regression easy).
12 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Quartiles for Heavy-Tailed Robust Distributions Let [y] be the data vector y sorted in ascending order. Global median ˆν [y] n/2 Global scale ˆρ [y] 3n/4 [y] n/4 2α Differences t := y t+1 y t. with α 1 In-segment scale ˆσ [ ] 3n/4 [ ] n/4 2β with β 12 Iteratively improve them, if the estimates are really not sufficient.
13 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Example: Gene Copy # Data All genes in a healthy human cell come in pairs, but can be lost or multiplied in tumor cells. With modern micro-arrays one can measure the copy-number of genes along a chromosome. It is important to determine the breaks, where copy-number chances. The measurements are very noisy [Pinkel 98]. Hence this is a natural application for piecewise constant regression of noisy (one-dimensional) data. Regression results of one aberrant and one healthy chromosome (without biological interpretation) are shown...
14 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Aberrant Gene Copy # of Chromosome Data Break-Probability PCRegression STD(µ_t y) Data (blue), PCR (black), BP (red), and variance 1/2 (green).
15 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Aberrant Gene Copy # of Chromosome Data BayesMean Data with Bayesian regression curve ± 1 std.-deviation.
16 Marcus Hutter Bayesian Regression of Piecewise Constant Functions log(evidence) Aberrant Gene Copy # of Chromosome log(evidence) ML#segments Estimate 100 sqrt(varseg) log P (y) (blue) and ˆk (green) as function of σ and our estimate ˆσ of (arg) max σ P (y) and ˆk(ˆσ) (black triangles) ML#segments
17 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Normal Gene Copy # of Chromosome Data with Bayesian regression curve.
18 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Posterior Segment Number Probability P (k y) log P(k y) GM k CH CMwG Gen(5,9) Gen(3,1) For medium Gaussian noise (GM, black), high Cauchy noise (CH, blue), medium Cauchy noise with Gaussian regression (CMwG, green), aberrant gene expression of chromosome 1 (Gen(3,1), red), normal gene expression of chromosome 9 (Gen(5,9), pink).
19 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Synthetic Example: What do you see?
20 Marcus Hutter Bayesian Regression of Piecewise Constant Functions 1.5 Synthetic Example: PC-Regression Data Break-Probability PCRegression Var(mu_t y) Data was indeed sampled from a three segment function with high Cauchy noise. Data (blue), PCR (black), BP (red), and Var (green).
21 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Synthetic Example: Bayesian Regression Curve Data with Bayesian regression curve ± 1 std.-deviation.
22 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Regression Summary of Gene and other Examples ( Setup) Gauss, Cauchy, Low, Medium, High noise, Gene true noise scale data size method global mean estimate global deviation estimate in-segment deviation est. log-evidence log P (y) rel. log-likelihood ll E[ll ˆf] Var[ll ˆf] 1/2 Opt.#segm. Confidence P (ˆk( 1, +1) y) Name σ n P ˆν ˆρ ˆσ log E ll E σ ll ˆk Ck( 1,+1) GL G %(0 20) GM G %(0 29) GH G %(10 12) CL C %(0 21) CM C %(0 27) CH C %(11 11) GMwC C %(0 26) CMwG G %(8 8) Gen G %(6 6) Gen G %(0 6)
23 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Extensions Any generalized one-segment evidence (no problem) Known segment levels (even easier) (Non)constant regressors (easy) Piecewise linear regression (easy) Continuous regression (harder, approximate) Non-parametric prior and noise (easy) Very large n (break into overlapping pieces, heuristic)
24 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Related work [Sen&Srivastava 75] Frequentist solution for detecting a single break. [Olshen&al 04] Generalization to pair of breaks. Heuristic recursion for further remaining breaks. [Jon 03,Lavielle 05] Penalized Maximum Likelihood. [Endres&Földiák 05] Piecewise constant (PC) Bayesian density estimation. [Lav 05,EF 05] Dynamic programming.
25 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Summary Full Bayesian PC-regression (Non)Gaussian noise and prior Handling of outliers Analytic estimate for in-segment variance Bayesian regression curve Break probabilities and variances Global evidence for model comparison Principled, little parameters to choose (important for det. of k).
26 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Thanks! Questions? Details: Papers at marcus Book intends to excite a broader AI audience about abstract Algorithmic Information Theory and inform theorists about exciting applications to AI. Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Bayes + Turing = = A Unified View of Artificial Intelligence
Exact Bayesian Regression of Piecewise Constant Functions
Exact Bayesian Regression of Piecewise Constant Functions Marcus Hutter RSISE @ ANU and SML @ NICTA Canberra, ACT, 2, Australia marcus@hutter.net 4 May 27 Abstract www.hutter.net We derive an exact and
More informationExact Bayesian Regression of Piecewise Constant Functions
Bayesian Analysis (2007) 2, Number 4, pp. 635 664 Exact Bayesian Regression of Piecewise Constant Functions Marcus Hutter Abstract. 1 We derive an exact and efficient Bayesian regression algorithm for
More informationFast Non-Parametric Bayesian Inference on Infinite Trees
Marcus Hutter - 1 - Fast Bayesian Inference on Trees Fast Non-Parametric Bayesian Inference on Infinite Trees Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2,
More informationOnline Prediction: Bayes versus Experts
Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,
More informationUniversal Convergence of Semimeasures on Individual Random Sequences
Marcus Hutter - 1 - Convergence of Semimeasures on Individual Sequences Universal Convergence of Semimeasures on Individual Random Sequences Marcus Hutter IDSIA, Galleria 2 CH-6928 Manno-Lugano Switzerland
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationOptimal normalization of DNA-microarray data
Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La
More informationOn the Foundations of Universal Sequence Prediction
Marcus Hutter - 1 - Universal Sequence Prediction On the Foundations of Universal Sequence Prediction Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationTowards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions
Marcus Hutter - 1 - Universal Artificial Intelligence Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Marcus Hutter Istituto Dalle Molle
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationTheoretically Optimal Program Induction and Universal Artificial Intelligence. Marcus Hutter
Marcus Hutter - 1 - Optimal Program Induction & Universal AI Theoretically Optimal Program Induction and Universal Artificial Intelligence Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationLinear models for the joint analysis of multiple. array-cgh profiles
Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for
More informationA Bayesian Criterion for Clustering Stability
A Bayesian Criterion for Clustering Stability B. Clarke 1 1 Dept of Medicine, CCS, DEPH University of Miami Joint with H. Koepke, Stat. Dept., U Washington 26 June 2012 ISBA Kyoto Outline 1 Assessing Stability
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationMaking rating curves - the Bayesian approach
Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationNonlinear Filtering. With Polynomial Chaos. Raktim Bhattacharya. Aerospace Engineering, Texas A&M University uq.tamu.edu
Nonlinear Filtering With Polynomial Chaos Raktim Bhattacharya Aerospace Engineering, Texas A&M University uq.tamu.edu Nonlinear Filtering with PC Problem Setup. Dynamics: ẋ = f(x, ) Sensor Model: ỹ = h(x)
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationNew Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER
New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationL-momenty s rušivou regresí
L-momenty s rušivou regresí Jan Picek, Martin Schindler e-mail: jan.picek@tul.cz TECHNICKÁ UNIVERZITA V LIBERCI ROBUST 2016 J. Picek, M. Schindler, TUL L-momenty s rušivou regresí 1/26 Motivation 1 Development
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationConvergence and Error Bounds for Universal Prediction of Nonbinary Sequences
Technical Report IDSIA-07-01, 26. February 2001 Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences Marcus Hutter IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland marcus@idsia.ch
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationA Bayesian perspective on GMM and IV
A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationPost-selection Inference for Changepoint Detection
Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationApplying Bayesian Estimation to Noisy Simulation Optimization
Applying Bayesian Estimation to Noisy Simulation Optimization Geng Deng Michael C. Ferris University of Wisconsin-Madison INFORMS Annual Meeting Pittsburgh 2006 Simulation-based optimization problem Computer
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationTweedie s Formula and Selection Bias. Bradley Efron Stanford University
Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationNon-Gaussian likelihoods for Gaussian Processes
Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More information9. Robust regression
9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationDecision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks
Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Alessandro Antonucci Marco Zaffalon IDSIA, Istituto Dalle Molle di Studi sull
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationDecomposable Graphical Gaussian Models
CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}
More informationA Brief Introduction to the Matlab package StepSignalMargiLike
A Brief Introduction to the Matlab package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the Matlab package for estimating change-points in stepwise signals.
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationA Brief Introduction to the R package StepSignalMargiLike
A Brief Introduction to the R package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the R package StepSignalMargiLike for estimating change-points in stepwise
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationStructure estimation for Gaussian graphical models
Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationDoing Bayesian Integrals
ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationHierarchical Modelling for Univariate Spatial Data
Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationEmpirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;
BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More information