Intelligent Systems I
|
|
- Alexander Turner
- 5 years ago
- Views:
Transcription
1 1, Intelligent Systems I 12 SAMPLING METHODS THE LAST THING YOU SHOULD EVER TRY Philipp Hennig & Stefan Harmeling Max Planck Institute for Intelligent Systems 23. January 2014 Dptmt. of Empirical Inference
2 2, Recap the course so far 1. intro: intelligence is reasoning under uncertainty 2. probability theory specifies the mathematics of uncertainty 3. graphical models: independence is crucial for probabilistic computations 4. Gaussians map probabilistic reasoning onto linear algebra 5. regression: Gaussian algebra allows learning functional relationships 6. kernels: it is possible to learn infinitely complex functions 7. classification: non-gaussian extension of regression, with approximate inference 8. SVM / kpca: special link function choices speed up inference 9. reproducing kernel Hilbert spaces: the capacity of kernel models 10. Statistical Learning Theory Bayes estimator is statistically optimal if prior correct. On VC-finite spaces, there are other methods (ERM) that converge to it for large data sets. 11. Neural Networks learning features, by maximum likelihood
3 3, Today probabilistic inference in intractable models probabilistic inference requires expectations, marginals f p = E p [f] = f(x)p(x) dx; p(y) = p(y x)p(x) dx Example: Marginalizing over hyperparameters, e.g. remember Gaussian process regression so far: p(f x X, y) = GP(f x X, y, θ)p(θ) dθ analytic integrals using Gaussian models ignore probability measure, only use maximum posterior/likelihood approximate posterior with simple Gaussian by Laplace approximation not covered in this course: more elaborate approximate posterior distributions using approximate inference, e.g. variational bounds, moment matching, kernel mean embeddings,...
4 samples from a probability distribution can be used to estimate expectations, roughly, without having to design an elaborate integration algorithm 4,
5 5, Sampling Methods aka Monte Carlo Methods [DJC MacKay, chapters 29 32] instead, the simplest thing to do : replace integral with sum: f(x)p(x)dx f(x i ); i p(x, y)dx p(y x i ); i if x i p(x) this requires being able to sample x i p(x) Definition 9.1 (Monte Carlo method) Algorithms that compute expectations in the above way, using samples x i p(x) are called Monte Carlo methods (Stanisław Ulam, John von Neumann). pictures:wikipedia
6 6, Why Sampling is a Good Idea sampling gives a rough estimate, fast The sampling estimator is unbiased ˆφ = 1 S s f(x s ) ˆφ = 1 S f(x s ) = φ s an unbiased estimator Its variance ( error ) drops over time ( ˆφ ˆφ ) 2 = [ 1 S s 2 (f(x s ) φ)] = 1 S 2 s f(x s )f(x r ) φf(x s ) f(x r )φ + φ 2 r = 1 S 2 (f(x s ) φ) 2 s = 1 S var(f)
7 7, sampling is for rough guesses a dumb way to learn π 1 ratio of in-circle to out-circle: π/4/1 π = 4 I(x x < 1)u(x) dx draw x u(x), check x x < 1, count 4 * sum(sum(rand(100,2).^2,2) < 1) / * sum(sum(rand(1e6,2).^2,2) < 1) / 1e
8 sampling is for rough guesses a dumb way to learn π var(f)/s # samples # samples need only 9 samples to get order of magnitude right (std(φ)/3) need samples for single-precision ( 10 7 ) calculations! sampling is good for rough estimates, not for precise calculations Always think of other options before trying to sample! 7,
9 8, samples from a probability distribution can be used to estimate expectations, roughly, without having to design an elaborate integration algorithm How do we generate samples from p(x)?
10 9, What is a Random Number? What is a sequence or random numbers?
11 9, What is a Random Number? What is a sequence or random numbers? dice, doubled
12 9, What is a Random Number? What is a sequence or random numbers? dice, doubled 41-70th digits of π
13 9, What is a Random Number? What is a sequence or random numbers? dice, doubled th digits of π th digits of 1/2(1 + 5)
14 9, What is a Random Number? What is a sequence or random numbers? dice, doubled th digits of π th digits of 1/2(1 + 5) von Neumann method, seed
15 9, What is a Random Number? What is a sequence or random numbers? dice, doubled 41-70th digits of π th digits of 1/2(1 + 5) von Neumann method, seed bits from G. Marsaglia s diehard CD for use in Monte Carlo, the important property is freedom from patterns (because it implies anytime unbiasedness) in fact, use for MC only really requires the right density, disorder is just helpful for the argument for use in cryptography, the important property is unpredictability randomness is a philosophically dodgy concept uncertainty is a much clearer idea (see first lectures)
16 10, Turning uniform samples into other distributions inverse transform sampling p(x) p(x) u, p(x) x requires P (x) = x p( x) d x requires solving x(u) = P 1 (u), for u Uniform[0, 1] tricky for multivariate distributions
17 Some special cases sampling from an exponential distribution is analytic 1 p(x) p(x) dx u, p(x) x p(x) = 1 λ e x/λ p(x) dx = 1 e x/λ 1 u = 1 e x/λ x = λ log(u) 11,
18 Some special cases the Box-Muller method for sampling from a Gaussian draw a square radius r 2 r 2 = 2 log u 1 p(r 2 ) = 1 2 draw an angle uniformly on unit circle write in Euclidean coordinates φ = 2πu 2 p(φ) = 1/(2π) exp ( r2 2 ) z 1 = r cos φ = 2 log u 1 cos(2πu 2 ) z 2 = r sin φ = 2 log u 1 sin(2πu 2 ) p(r 2, φ) d(r 2 ) dφ = 1 exp ( r2 2 2 ) d(r2 ) 1 2π dφ = 1 2π exp ( 1 2 r2 ) r dr dφ = 1 2π exp ( 1 2 (z2 1 + z2)) 2 dz 1 dz 2 = N (z 1 ; 0, 1)N (z 2 ; 0, 1) requires 1 unit random, 1 sin, 0.5 sqrt, 0.5 log per draw. GEP Box, ME Muller, Note on the Generation of Random Normal Deviates, Ann. Math. Stat., ,
19 13, The Box Muller method, graphically and why it requires two independent samples p(r) cos(φ) r sin(φ)
20 14, Specialised direct sampling methods exist It is worth thinking about hard problems if they are important For example Ziggurat Method: For monotonically decaying pdfs: Approximate PDF with an upper and lower bound step function, only spend computational effort on the difficult cases in between (currently the best way to draw normal samples) G. Marsaglia; W.W. Tsang The Ziggurat Method for Generating Random Variables J of Statistical Software 5 (8), 2000 Subgroup Algorithm: To generate uniformly random elements from a compact group, iteratively draw in an expanding chain of subgroups by drawing uniformly from the cosets (e.g. used to draw rotation, permutation matrices at random) P. Diaconis; M. Shahshahani The subgroup algorithm for generating uniform random variables Probability in the Engineering and Informational Sciences, 1987
21 15, samples from a probability distribution can be used to estimate expectations, roughly random numbers don t really need to be unpredictable, as long as they have as little structure as possible uniformly distributed random numbers can be transformed into other distributions. This can be done numerically efficiently in some cases, and it is worth thinking about doing so What do we do if we don t know a good transformation?
22 Rejection Sampling a simple method [Georges-Louis Leclerc, Comte de Buffon, ] for any p(x) = p(x)/z (normalizer Z not required) choose q(x) s.t. cq(x) p(x) draw s q(x), u Uniform[0, cq(s)] reject if u > p(s) 16,
23 17, The Problem with Rejection Sampling the curse of dimensionality [MacKay, 29.3] p(x) cq(x) Example: p(x) = N (x; 0, σp) 2 q(x) = N (x; 0, σq) 2 σ q > σ p optimal c = (2πσ2 q) D/2 (2πσ 2 p) D /2 = exp (D ln σ q σ p ) acceptance rate is ratio of volumes: 1/c rejection rate rises exponentially in D for σ q /σ p = 1.1, D = 100, 1/c < 10 4
24 18, Importance Sampling a slightly less simple method computing p(x), q(x), then throwing them away seems wasteful instead, rewrite (assume q(x) > 0 if p(x) > 0) φ = f(x)p(x) dx = f(x) p(x) q(x) dx q(x) 1 S f(x s ) p(x s) if x s q(x) s q(x s ) this is just using a new function g(x) = f(x)p(x)/q(x) so it is an unbiased estimator if normalization unknown, can also use p(x) = Zp(x) f(x)p(x) = 1 1 Z S f(x s ) p(x s) s q(x s ) dx = 1 S p(x s )/q(x s ) f(x s ) 1 s S s 1 p(x s)/q(x s ) = f(x s )w s s this is consistent, but biased, because of double estimation
25 19, What s wrong with Importance Sampling? the curse of dimensionality, revisited recall that var ˆφ = var(f)/s importance sampling replaces var(f) with var(g) = var (f p q ) var (f p ) can be very large if q p somewhere q in many dimensions, usually q p all but everywhere! if p has undiscovered islands, some samples have p(x)/q(x) 4 log 1 0 sample count x f(x), g(x)
26 20, Marginalizing Hyperparameters of a Gaussian process regression model 4 2 y x assume p(f λ) = GP(0, k(λ)), p(y f(x), σ) = N (y, f x, σ 2 ) with kernel k(x, x, λ) = exp( 1 /2(x x ) 2 /λ 2 ) we would like p(f y) = p(f y, σ, λ)p(σ, λ) dσ dλ = GP(f; µ y,x,σ,λ, k x,σ,λ )p(σ, λ) dσ dλ
27 21, Marginalizing Hyperparameters of a Gaussian process regression model λ σ 1 log likelihood log p(y σ, λ) = 1 2 y (k xx (λ)+σ 2 I)y 1 2 log k xx(λ)+σ 2 I N 2 log 2π
28 22, Marginalizing Hyperparameters of a Gaussian process regression model λ σ 1 proposal distribution p(σ, λ) = N [( λ σ ) ; ( ), (0.2 )] I(σ > 0)/Z with Z =
29 23, Marginalizing Hyperparameters of a Gaussian process regression model 4 2 f(x) x w 1 p(f λ 1, σ 1, x, y)
30 23, Marginalizing Hyperparameters of a Gaussian process regression model 4 2 f(x) x w 2 p(f λ 2, σ 2, x, y)
31 23, Marginalizing Hyperparameters of a Gaussian process regression model 4 2 f(x) x w 3 p(f λ 3, σ 3, x, y)
32 23, Marginalizing Hyperparameters of a Gaussian process regression model 4 2 f(x) x i w i p(f λ i, σ i ) p(f x, y)
33 24, Summary Sampling (Monte Carlo) Methods Sampling is a way of performing rough probabilistic computations, in particular for expectations (including marginalization). samples from a probability distribution can be used to estimate expectations, roughly random numbers don t really need to be unpredictable, as long as they have as little structure as possible uniformly distributed random numbers can be transformed into other distributions. This can be done numerically efficiently in some cases, and it is worth thinking about doing so Rejection sampling is a primitive but exact method that works with intractable models Importance sampling makes more efficient use of samples, but can have high variance (and this may not be obvious) In two weeks: Markov Chain Monte Carlo methods are more elaborate ways of getting approximate answers to intractable problems.
34 25, Next Week guest lecture on causality by Dominik Janzing (MPI IS).
35 26,
Bayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationRandom processes and probability distributions. Phys 420/580 Lecture 20
Random processes and probability distributions Phys 420/580 Lecture 20 Random processes Many physical processes are random in character: e.g., nuclear decay (Poisson distributed event count) P (k, τ) =
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationGenerating the Sample
STAT 80: Mathematical Statistics Monte Carlo Suppose you are given random variables X,..., X n whose joint density f (or distribution) is specified and a statistic T (X,..., X n ) whose distribution you
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationIntelligent Systems I
Intelligent Systems I 00 INTRODUCTION Stefan Harmeling & Philipp Hennig 24. October 2013 Max Planck Institute for Intelligent Systems Dptmt. of Empirical Inference Which Card? Opening Experiment Which
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationECE295, Data Assimila0on and Inverse Problems, Spring 2015
ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch
More informationMarkov Chain Monte Carlo (MCMC)
School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationPrediction of Data with help of the Gaussian Process Method
of Data with help of the Gaussian Process Method R. Preuss, U. von Toussaint Max-Planck-Institute for Plasma Physics EURATOM Association 878 Garching, Germany March, Abstract The simulation of plasma-wall
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationMTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6
MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More information14 : Approximate Inference Monte Carlo Methods
10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Approximate Inference Monte Carlo Methods Lecturer: Kayhan Batmanghelich Scribes: Biswajit Paria, Prerna Chiersal 1 Introduction We have
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationMonte Carlo Methods. Jesús Fernández-Villaverde University of Pennsylvania
Monte Carlo Methods Jesús Fernández-Villaverde University of Pennsylvania 1 Why Monte Carlo? From previous chapter, we want to compute: 1. Posterior distribution: π ³ θ Y T,i = f(y T θ,i)π (θ i) R Θ i
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationReduction of Variance. Importance Sampling
Reduction of Variance As we discussed earlier, the statistical error goes as: error = sqrt(variance/computer time). DEFINE: Efficiency = = 1/vT v = error of mean and T = total CPU time How can you make
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationSampling Distributions Allen & Tildesley pgs and Numerical Recipes on random numbers
Sampling Distributions Allen & Tildesley pgs. 345-351 and Numerical Recipes on random numbers Today I explain how to generate a non-uniform probability distributions. These are used in importance sampling
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationI. Bayesian econometrics
I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More information2 Random Variable Generation
2 Random Variable Generation Most Monte Carlo computations require, as a starting point, a sequence of i.i.d. random variables with given marginal distribution. We describe here some of the basic methods
More informationStat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationMCMC notes by Mark Holder
MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationIntroduction to Machine Learning
Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Probabilistic Models of Cognition, 2011 http://www.ipam.ucla.edu/programs/gss2011/ Roadmap: Motivation Monte Carlo basics What is MCMC? Metropolis Hastings and Gibbs...more tomorrow.
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationTDA231. Logistic regression
TDA231 Devdatt Dubhashi dubhashi@chalmers.se Dept. of Computer Science and Engg. Chalmers University February 19, 2016 Some data 5 x2 0 5 5 0 5 x 1 In the Bayes classifier, we built a model of each class
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationLecture 8: PGM Inference
15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More information